Skip to main content

Simulate Sibling Pairs

Requires PopGenSims.jl

To perfom simulations, you will need add and import the package PopGenSims.jl (available here).

It's good practice to use your data to simulate sibling pairs and perform relatedness analyses on the simulations to understand the validity/strength of relatedness estimators on your data. To do this, you can use simulatekin and specify the relationship you want to simulate and how many pairs to create for that relationship.

simulatekin(data::PopData; fullsib::Int, halfsib::Int, unrelated::Int, parentoffspring::Int, ploidy::Int)

This function will simulate mating crosses to generate sample pairs with any combination of the specified relationships, returning a PopData object. The simulations will first generate parents of a given ploidy (inferred or specified) by drawing alleles from a global allele pool derived from the given data (i.e. weighted by their frequencies).

Relationship

Simulated parents will be crossed to generate samples depending on the relationship:

relationship# parents# offspringreturns
fullsib22 full siblings2 offspring
halfsib32 half siblings2 offspring
unrelated--2 randomly generated individuals
parentoffspring21 offspring1 parent + 1 offspring

Identifying pairs

The relationship between the newly generated samples can be identified by:

  • Sample names will specify their simulation number, relationship, and whether parent or offspring
    • Naming convention: [simulation #]_[relationship]_[offspring #]
    • example: sim005_fullsib_1 = [simulation 005]_[full sibling]_[offspring 1]
  • Their population name will be that of their relationship (e.g. "fullsib")
plugging into relatedness

The kinship function will recognize the population names output from simulating siblingship and only estimate relatedness for the appropriate pairs. If you need this functionality, you are strongly discouraged from manually editing the resulting PopData from simulatekin.

Ploidy

By default, the ploidy of the simulated parents and offspring are inferred from the supplied PopData.

adjusting ploidy

If you have mixed-ploidy data or wish to generate parents and offspring of a ploidy different than the source PopData you can specify the ploidy with which to simulate parents and offspring. For example, if your PopData is diploid, but you wish to generate triploid or octoploid parents and offspring, you would specify ploidy = 3 or ploidy = 8 repectively.

Odd ploidy

If trying to create offspring with an odd ploidy (3, 5, etc.), each parent has a 50% chance of contributing (½ × ploidy) + 1 alleles for all loci to the offspring. In other words, if ploidy = 3, there's a 50% chance parent_1 will give 2 alleles for every locus for that simulated cross.

Example

julia> cats = @nanycats ;

julia> cat_sims = simulatekin(cats, fullsib = 10, halfsib = 50)
PopData{Diploid, 9 Microsatellite loci}
Samples: 120
Populations: 2

julia> cat_sims.sampleinfo
120×3 DataFrame
Row │ name population ploidy
│ String String Int64
─────┼─────────────────────────────────────
1 │ sim01_fullsib_1 fullsib 2
2 │ sim01_fullsib_2 fullsib 2
3 │ sim02_fullsib_1 fullsib 2
4 │ sim02_fullsib_2 fullsib 2
5 │ sim03_fullsib_1 fullsib 2
6 │ sim03_fullsib_2 fullsib 2
⋮ │ ⋮ ⋮ ⋮
115 │ sim48_halfsib_1 halfsib 2
116 │ sim48_halfsib_2 halfsib 2
117 │ sim49_halfsib_1 halfsib 2
118 │ sim49_halfsib_2 halfsib 2
119 │ sim50_halfsib_1 halfsib 2
120 │ sim50_halfsib_2 halfsib 2
108 rows omitted