Skip to main content

Data Exploration

Allele frequency tableโ€‹

allelefreqtable(data::PopData; by::Union{String, Symbol} = "global")

Return a table of the observed global (default) or population allele frequencies in a PopData object. Use this if you want to see what the frequencies are for every allele at every locus.

julia> cats = @nancycats ;

julia> allelefreqtable(cats)
108ร—4 DataFrame
Row โ”‚ locus allele count frequency
โ”‚ String Int16? Int64 Float64
โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
1 โ”‚ fca8 135 105 0.241935
2 โ”‚ fca8 143 44 0.101382
3 โ”‚ fca8 133 33 0.0760369
4 โ”‚ fca8 137 83 0.191244
โ‹ฎ โ”‚ โ‹ฎ โ‹ฎ โ‹ฎ โ‹ฎ
105 โ”‚ fca37 226 2 0.00421941
106 โ”‚ fca37 216 7 0.0147679
107 โ”‚ fca37 224 2 0.00421941
108 โ”‚ fca37 204 6 0.0126582
100 rows omitted

Genotype frequency tableโ€‹

genofreqtable(data::PopData; by::Union{String, Symbol} = "global")

Return a table of the observed global (default) or population genotype frequencies in a PopData object. Use this if you want to see what the frequencies are for every genotype at every locus.

julia> cats = @nancycats ;

julia> genofreqtable(cats)
341ร—4 DataFrame
Row โ”‚ locus genotype count frequency
โ”‚ String Tupleโ€ฆ Int64 Float64
โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
1 โ”‚ fca8 (135, 143) 16 0.0737327
2 โ”‚ fca8 (133, 135) 9 0.0414747
3 โ”‚ fca8 (135, 135) 23 0.105991
4 โ”‚ fca8 (137, 143) 8 0.0368664
โ‹ฎ โ”‚ โ‹ฎ โ‹ฎ โ‹ฎ โ‹ฎ
338 โ”‚ fca37 (206, 220) 1 0.00421941
339 โ”‚ fca37 (208, 218) 1 0.00421941
340 โ”‚ fca37 (184, 184) 3 0.0126582
341 โ”‚ fca37 (208, 210) 3 0.0126582
333 rows omitted

Missing Dataโ€‹

missingdata(data::PopData; by::Union{String, Symbol} = "sample")

Get missing genotype information in a PopData object. Specify a mode of operation using the by = keyword to return a table corresponding with that missing information type.

bywhat it does
"sample"returns a count of missing loci per individual (default)
"population"returns a count of missing genotypes per population
"locus"returns a count of missing genotypes per locus
"locusxpopulation"returns a count of missing genotypes per locus per population
julia> sharks = @gulfsharks ;

julia> missingdata(sharks)
212ร—2 DataFrame
Row โ”‚ name missing
โ”‚ String Int64
โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
1 โ”‚ cc_001 124
2 โ”‚ cc_002 94
3 โ”‚ cc_003 100
4 โ”‚ cc_005 0
5 โ”‚ cc_007 2
6 โ”‚ cc_008 1
7 โ”‚ cc_009 2
โ‹ฎ โ”‚ โ‹ฎ โ‹ฎ
206 โ”‚ seg_025 0
207 โ”‚ seg_026 0
208 โ”‚ seg_027 2
209 โ”‚ seg_028 25
210 โ”‚ seg_029 0
211 โ”‚ seg_030 1
212 โ”‚ seg_031 1
198 rows omitted

Pairwise Identical Genotypesโ€‹

While not a substitute for a kinship analysis, it may be useful to know or verify how similar your data are in a very literal sense: how many identical genotypes do two individuals have across all loci? To do this, we use pairwiseidentical() to perform an all x all comparison of identical genotypes. This can be done for all individuals in a PopData object, or restricted to a specific set of individuals:

julia> cats = @nancycats;

julia> pairwiseidentical(cats)
27966ร—4 DataFrame
Row โ”‚ sample_1 sample_2 identical n
โ”‚ String String Float64 Int64
โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
1 โ”‚ N215 N216 0.5 8
2 โ”‚ N215 N217 0.25 8
3 โ”‚ N215 N218 0.38 8
4 โ”‚ N215 N219 0.38 8
โ‹ฎ โ”‚ โ‹ฎ โ‹ฎ โ‹ฎ โ‹ฎ
27963 โ”‚ N297 N290 0.29 7
27964 โ”‚ N281 N289 0.25 8
27965 โ”‚ N281 N290 0.43 7
27966 โ”‚ N289 N290 0.14 7
27958 rows omitted

Allelic Richnessโ€‹

If you were curious about allelic richness (number of alleles per locus), then you can use richness() to find that out. Use by = "population" to return a table by locus by population.

julia> cats = @nancycats;

julia> richness(cats)
9ร—2 DataFrame
Row โ”‚ locus richness
โ”‚ String Int64
โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
1 โ”‚ fca8 16
2 โ”‚ fca23 11
3 โ”‚ fca43 10
4 โ”‚ fca45 9
5 โ”‚ fca77 12
6 โ”‚ fca78 8
7 โ”‚ fca90 12
8 โ”‚ fca96 12
9 โ”‚ fca37 18

Average Number of Allelesโ€‹

Similar to richness, if you wanted to know the average number of alleles per locus, use alleleavg(). Use rounding = false if you don't want the answer rounded to 4 decimal places.

julia> alleleavg(@nancycats)
(mean = 12.0, stdev = 0.2668)

julia> alleleavg(@nancycats, rounding = false)
(mean = 12.0, stdev = 0.2667968432263687)

Summary Statisticsโ€‹

Population genetics famously includes all manner of coefficients with which to summarize data. Use summary() to view FST, DST, HT, etc. (like Hierfstat::basic.stats).

julia> summary(@nancycats)
1ร—10 DataFrame
Row โ”‚ Het_obs HS HT DST HTโ€ฒ DSTโ€ฒ FST FSTโ€ฒ FIS DEST
โ”‚ Float64 Float64 Float64 Float64 Float64 Float64 Float64 Float64 Float64 Float64
โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
1 โ”‚ 0.6299 0.7083 0.7717 0.0634 0.7757 0.0674 0.0821 0.0869 0.1108 0.231
prime symbol

The column names above use the unicode prime symbol โ€ฒ to better reflect the actual coefficient ("FST prime" etc.). To print that character, press \prime<TAB>, which reads "backslash, the word 'prime', and the TAB button".