The rapid digitization of genealogical and medical records enables the assembly of extremely large pedigree records spanning millions of individuals and trillions of pairs of relatives. Such pedigrees provide the opportunity to answer genetic and epidemiological questions in scales much larger than previously possible. Linear mixed models (LMMs) are often used for analysis of pedigree data. However, traditional LMMs do not scale well due to steep computational and storage requirements. Here, we propose a novel modeling framework called Sparse Cholesky factorIzation LMM (Sci-LMM), that alleviates these difficulties by exploiting the sparsity patterns found in population-scale family-trees. The proposed framework constructs a matrix of genetic relationships between trillions of pairs of individuals in several hours, and can fit the corresponding LMM in several days. We demonstrate the capabilities of Sci-LMM via simulation studies and by estimating the heritability of longevity in a very large pedigree spanning millions of individuals and over five centuries of human history. The Sci-LMM framework enables the analysis of extremely large pedigrees that was not previously possible. Sci-LMM is available at https://github.com/TalShor/SciLMM.
| Original language | Undefined/Unknown |
|---|
| DOIs | |
|---|
| State | Published - 2018 |
|---|
| Name | bioRxiv |
|---|
| Publisher | Cold Spring Harbor Laboratory Press |
|---|