Skip to main content

Kinship

Background

Sometimes you want or need to know the relatedness of individuals in a sample. Relatedness is exactly what its name implies: how related individuals are given some provided genetic information (e.g. full siblings, half siblings, etc.). Relatedness can be used in quantitative genetics to estimate heritability, additive genetic variances, and covariances. It can also be used in population genetics to study isolation-by-distance or population structure.

The goal of calculating relatedness from molecular markers is to accurately estimate the proportion of the genome which is identical by descent between two individuals. With a pedigree this is "relatively" straightforward. However, for large, natural, populations pedigrees tend not to exist and some brilliant minds have developed various equations to estimate the relatedness from a set of molecular markers. Given two diploid individuals, there are 9 "identity by descent" models available between them (Jacquard 1975, paywall), as shown below (from Milligan 2003):

Jacquard IBD

Broadly speaking there are two different ways of estimating genetic relatedness using molecular markers: methods of moments, and likelihood estimators. Generally, moments estimators will be faster but aren't constrained to being between the theoretical minimum and maximum values of 0 and 1. The likelihood estimators use likelihood functions derived from the set of Jacquard Identity States (above) to determine the most likely inheritance pattern. One difference between the two classes is (generally) moments estimators require an assumption of no inbreeding, while that assumption isn't necessarily required for likelihood estimators (though it does simplify the math). It is increasingly common to use multiple estimators on pairs, simulated from your molecular markers, with known relationships to determine the most appropriate estimator to use with your data.

PopGen.jl implements a wide variety of moments-based estimators: Blouin, Li & Horvitz, Loiselle, Lynch, Lynch/Li, Lynch & Ritland, Moran, Queller & Goodnight, Ritland, and Wang. Along with these, we provide an option to estimate mean, median, standard error, and confidence intervals using bootstrapping. Have a look at the guide we provide detailing how to perform a basic relatedness analysis.

A note about removing kin

There are reasons for removing kin in population genetics datasets. For one, there are no siblings/kin or mixed-generations in a Hardy-Weinberg Equilibrium population, and the inclusion of siblings/kin in analyses that rely on HWE assumptions [technically] violate those assumptions. However, there are also arguments to keep kin/siblings in your data, those data are important for effective population size, linkage disequilibrium, etc.

see Waples and Anderson (2017)

Calculate Relatedness

There are two main methods of calculating pairwise relatedness, an all x all comparison of an entire PopData object or an all x all comparison for a subset of individuals in a PopData object. Regardless of which you prefer, they can be perfomed without bootstrapping, returning a NamedMatrix, or with bootstrapping, returning a DataFrame. Neither method estimates self-relatedness, so the diagonals of the NamedMatrix should be ignored. The resulting NamedMatrix can be converted to a DataFrame using kinshiptotable().

kinship(data::PopData; kwargs...)

Calculate pairwise relatedness estimates for all individuals in a PopData object using the specified method (see below). Returns a NamedMatrix if not performing bootstrapping, otherwise returns a DataFrame (since bootstrapping provides more output information). To calculate means, median, standard error, and confidence intervals using bootstrapping, set iterations = n where n is an integer greater than 0 (the default) corresponding to the number of bootstrap iterations you wish to perform for each pair. The default confidence interval is [0.0275, 0.975] (i.e. 95%), however that can be changed by supplying a Vector{Float64} of [low, high] to the keyword interval. Note: samples must be diploid.

Arguments

  • data : A PopData object

Keyword Arguments

  • method::Function : A method function (see below)
  • iterations::Int64 : The number of iterations to perform bootstrapping (default: 0, will not perform bootstrapping)
  • interval::Vector{Float64} : A Vector of [low, high] indicating the confidence intervals you would like for bootstrapping (default: [0.275, 0.975], i.e. 95%)
julia> cats = @nancycats ; 

julia> kin = kinship(cats, method = Moran)
237×237 Named Matrix{Float64}
A ╲ B │ N215 N216 … N289 N290
──────┼──────────────────────────────────────────────────────────
N215 │ 8.13724e-316 1.623381.04589 1.15351
N216 │ 1.62338 0.29485 0.957724 1.1637
N217 │ 0.673577 0.587163 0.547427 0.709887
N218 │ 0.896935 0.72942 0.919448 0.791255
⋮ ⋮ ⋮ ⋱ ⋮ ⋮
N297 │ 0.757915 0.858834 1.15432 1.2677
N281 │ 0.686057 0.604236 0.942749 1.08762
N289 │ 1.04589 0.957724 0.0 1.104
N290 │ 1.15351 1.16371.104 0.0

Methods

There are several estimators available and are listed below. kinship takes the function names as arguments (case sensitive), therefore do not use quotes or colons in specifying the methods.

autocompletion

Since the methods correspond to function names, they will tab-autocomplete when inputting them. For more information on a specific method, please see the respective docstring (e.g. ?Loiselle).

MethodTypeMethod Call
Blouin 1996moments-basedBlouin
Li & Horvitz 1953moments-basedLiHorvitz
Loiselle 1995moments-basedLoiselle
Lynch 1988moments-basedLynch
Lynch/Li 1993moments-basedLynchLi
Lynch & Ritland 1999moments-basedLynchRitland
Moran 1950moments-basedMoran
Queller & Goodnight 1989moments-basedQuellerGoodnight
Ritland 1996moments-basedRitland
Milligan 2003 "DyadML"maximum-likelihoodincomplete*
Wang 2002moments-basedincomplete*
*more kinship estimators

Contact us or submit a pull request if you're interested in contributing to the kinship methods. DyadML and Wang (2002) estimators are currently incomplete and the others could use some optimizations. More help is always welcomed! Our wishlist also includes the KING method 😄

Posthoc analyses

There are several different kinds of things you can do with kinship information (e.g. network analysis), and one that's provided is lovingly called kinshipposthoc(), which performs a permutation analysis to test if within-population relatedness is significantly greater than between-population relatedness for each population. We recommend that you correct for multiple testing using MultipleTesting.jl.

kinshipposthoc(data::PopData, results::Union{DataFrame, NamedTuple}; iterations::Int)

Arguments

  • data : A PopData object
  • results : the DataFrame or NamedTuple results from kinship()

Keyword Arguments

  • iterations : number of iterations for the permutation tests (default: 20000)
not a great name

We admit "kinshipposthoc" is not a great name for this function. Please contact us with better ideas! 😁

Example

julia> cats = @nancycats ;

julia> rel_out = kinship(cats, method = [Ritland, Moran], iterations = 100);

julia> kinshipposthoc(cats, rel_out)
17x3 DataFrame
Row │ population Ritland_P Moran_P
│ String Float64 Float64
─────┼────────────────────────────────
1 │ 1 5.0e-5 5.0e-5
2 │ 2 5.0e-5 5.0e-5
3 │ 3 5.0e-5 5.0e-5
4 │ 4 5.0e-5 5.0e-5
5 │ 5 5.0e-5 5.0e-5
6 │ 6 5.0e-5 5.0e-5
7 │ 7 5.0e-5 5.0e-5
8 │ 8 5.0e-5 5.0e-5
9 │ 9 5.0e-5 5.0e-5
10 │ 10 5.0e-5 5.0e-5
11 │ 11 5.0e-5 5.0e-5
12 │ 12 5.0e-5 5.0e-5
13 │ 13 5.0e-5 5.0e-5
14 │ 14 5.0e-5 5.0e-5
15 │ 15 5.0e-5 5.0e-5
16 │ 16 5.0e-5 5.0e-5
17 │ 17 5.0e-5 5.0e-5

Acknowledgements

The kinship methods were dutifully written and verified against R analogues by Jason Selwyn. They have since been rewritten twice for speed and memory optimization (14mb vs 14gb!). These anaylses can take a while, especially if performing bootstrapping; we provide a progress bar via Term.jl so you can move on and focus on other things in the meantime.