BLUP — Overview
BeePass uses the BLUP (Best Linear Unbiased Prediction) model coupled with the haplo-diploid relationship matrix from Brascamp & Bijma (2014) to estimate the breeding values of your queens. The pipeline also integrates environmental correction via machine learning (XGBoost) and a threshold model for ordinal traits.
The ONE SHOT Pipeline
Genetic evaluation is performed in a single automated sequence called ONE SHOT. Each step feeds into the next:
- Snapshot — Full database backup before computation (pg_dump + SHA256 fingerprints). If anything goes wrong, we can always roll back.
- Environment — Geocode evaluation locations (postal code + country), fetch altitude and seasonal weather data (rainfall, mean temperature, hot days >30 °C) over the March 15 - September 30 window.
- XGBoost Train — Train an environmental correction model on honey yield, using altitude, rainfall, temperature, and hot day count as features.
- XGBoost Apply — Correct honey measurements:
y_corr = log1p(honey_kg) - predicted_env. The corrected yield reflects genetic potential, regardless of location. - BLUPF90+ — Estimate breeding values for honey (continuous trait) with simultaneous REML variance component estimation. Mixed model with queen and worker effects.
- THRGIBBS — Estimate breeding values for ordinal traits (gentleness, vigor, wintering, non-swarming, comb stability) and hygiene traits (HYG 6h, HYG 24h) using a threshold Gibbs sampler (threshold probit model).
- Normalisation — Scale all EBV to BeeBreed-compatible format: mean = 100, standard deviation = 10.
- Storage — Atomic write to database. Results only become visible if the entire pipeline succeeds.
The pipeline is triggered by an administrator. Results appear on each queen's detail page, under the Evaluations tab.
Why Correct for Environment?
Honey production varies enormously by location: a queen at 1,200 m altitude doesn't have the same floral resources as one on the plains. Without correction, we'd be comparing environments, not genetics.
XGBoost models the influence of altitude, rainfall, mean temperature, and hot day count on honey yield. By subtracting this environmental prediction, we isolate the genetic component of yield.
Honeybee-Specific Challenges
Genetic evaluation in honeybees differs fundamentally from conventional livestock species:
- Haplo-diploidy — Males (drones) are haploid: they carry only one set of chromosomes. Standard relationship formulas don't apply.
- Polyandry — The queen mates with many drones (10-20). Fathers are modelled as a "sire group" rather than individually.
- Dual genetic effect — Colony performance depends on both the queen's genes and the worker bees' genes (daughters of the queen). The BLUP model separates these two effects.
Reference: Brascamp, E. W. & Bijma, P. (2014). Methods to compute optimum contribution to simultaneously infer the genetic trend and the genetic merit of animals from a crossbred population. Genetics Selection Evolution, 46:56.
See also:
- Reading Your EBV — Interpreting the results
- Reliability (r²) — Understanding estimation accuracy
- Inbreeding — Relationship matrix and genetic diversity