Data Exploration
Allele frequency tableโ
allelefreqtable(data::PopData; by::Union{String, Symbol} = "global")
Return a table of the observed global
(default) or population
allele frequencies in a PopData object. Use this if you want to see what the frequencies are for every allele at every locus.
- global
- population
julia> cats = @nancycats ;
julia> allelefreqtable(cats)
108ร4 DataFrame
Row โ locus allele count frequency
โ String Int16? Int64 Float64
โโโโโโผโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
1 โ fca8 135 105 0.241935
2 โ fca8 143 44 0.101382
3 โ fca8 133 33 0.0760369
4 โ fca8 137 83 0.191244
โฎ โ โฎ โฎ โฎ โฎ
105 โ fca37 226 2 0.00421941
106 โ fca37 216 7 0.0147679
107 โ fca37 224 2 0.00421941
108 โ fca37 204 6 0.0126582
100 rows omitted
julia> cats = @nancycats ;
julia> allelefreqtable(cats, by = "population")
839ร5 DataFrame
Row โ locus population allele count frequency
โ String String Int16? Int64 Float64
โโโโโโผโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
1 โ fca8 1 135 9 0.5625
2 โ fca8 1 143 4 0.25
3 โ fca8 1 133 2 0.125
4 โ fca8 1 137 1 0.0625
โฎ โ โฎ โฎ โฎ โฎ โฎ
836 โ fca37 16 210 5 0.208333
837 โ fca37 17 208 22 0.846154
838 โ fca37 17 182 3 0.115385
839 โ fca37 17 220 1 0.0384615
831 rows omitted
Genotype frequency tableโ
genofreqtable(data::PopData; by::Union{String, Symbol} = "global")
Return a table of the observed global
(default) or population
genotype frequencies in a PopData object. Use this if you want to see what the frequencies are for every genotype at every locus.
- global
- population
julia> cats = @nancycats ;
julia> genofreqtable(cats)
341ร4 DataFrame
Row โ locus genotype count frequency
โ String Tupleโฆ Int64 Float64
โโโโโโผโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
1 โ fca8 (135, 143) 16 0.0737327
2 โ fca8 (133, 135) 9 0.0414747
3 โ fca8 (135, 135) 23 0.105991
4 โ fca8 (137, 143) 8 0.0368664
โฎ โ โฎ โฎ โฎ โฎ
338 โ fca37 (206, 220) 1 0.00421941
339 โ fca37 (208, 218) 1 0.00421941
340 โ fca37 (184, 184) 3 0.0126582
341 โ fca37 (208, 210) 3 0.0126582
333 rows omitted
julia> cats = @nancycats ;
julia> genofreqtable(cats, by = "population")
1094ร5 DataFrame
Row โ locus population genotype count frequency
โ String String Tupleโฆ Int64 Float64
โโโโโโโผโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
1 โ fca8 1 (135, 143) 3 0.375
2 โ fca8 1 (133, 135) 2 0.25
3 โ fca8 1 (135, 135) 2 0.25
4 โ fca8 1 (137, 143) 1 0.125
โฎ โ โฎ โฎ โฎ โฎ โฎ
1091 โ fca37 17 (208, 208) 10 0.769231
1092 โ fca37 17 (182, 182) 1 0.0769231
1093 โ fca37 17 (182, 208) 1 0.0769231
1094 โ fca37 17 (208, 220) 1 0.0769231
1086 rows omitted
Missing Dataโ
missingdata(data::PopData; by::Union{String, Symbol} = "sample")
Get missing genotype information in a PopData
object. Specify a mode of operation using the by =
keyword to return a table corresponding with that missing information type.
by | what it does |
---|---|
"sample" | returns a count of missing loci per individual (default) |
"population" | returns a count of missing genotypes per population |
"locus" | returns a count of missing genotypes per locus |
"locusxpopulation" | returns a count of missing genotypes per locus per population |
- sample
- population
- locus
- locusxpopulation
julia> sharks = @gulfsharks ;
julia> missingdata(sharks)
212ร2 DataFrame
Row โ name missing
โ String Int64
โโโโโโผโโโโโโโโโโโโโโโโโโ
1 โ cc_001 124
2 โ cc_002 94
3 โ cc_003 100
4 โ cc_005 0
5 โ cc_007 2
6 โ cc_008 1
7 โ cc_009 2
โฎ โ โฎ โฎ
206 โ seg_025 0
207 โ seg_026 0
208 โ seg_027 2
209 โ seg_028 25
210 โ seg_029 0
211 โ seg_030 1
212 โ seg_031 1
198 rows omitted
julia> sharks = @gulfsharks ;
julia> missingdata(sharks, by = "population")
7ร2 DataFrame
Row โ population missing
โ String Int64
โโโโโโผโโโโโโโโโโโโโโโโโโโโโโโโโ
1 โ Cape Canaveral 666
2 โ Georgia 423
3 โ South Carolina 233
4 โ Florida Keys 1241
5 โ Mideast Gulf 99
6 โ Northeast Gulf 472
7 โ Southeast Gulf 1504
julia> sharks = @gulfsharks ;
julia> missingdata(sharks, by = "locus")
2209ร2 DataFrame
Row โ locus missing
โ String Int64
โโโโโโโผโโโโโโโโโโโโโโโโโโโโโโโ
1 โ contig_35208 0
2 โ contig_23109 6
3 โ contig_4493 3
4 โ contig_10742 2
5 โ contig_14898 0
6 โ contig_8483 0
7 โ contig_8065 0
โฎ โ โฎ โฎ
2203 โ contig_18959 0
2204 โ contig_43517 6
2205 โ contig_27356 2
2206 โ contig_475 0
2207 โ contig_19384 5
2208 โ contig_22368 3
2209 โ contig_2784 7
2195 rows omitted
julia> sharks = @gulfsharks ;
julia> missingdata(sharks, by = "locusxpopulation")
15463ร3 DataFrame
Row โ locus population missing
โ String String Int64
โโโโโโโโผโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
1 โ contig_35208 Cape Canaveral 0
2 โ contig_35208 Georgia 0
3 โ contig_35208 South Carolina 0
4 โ contig_35208 Florida Keys 0
5 โ contig_35208 Mideast Gulf 0
6 โ contig_35208 Northeast Gulf 0
7 โ contig_35208 Southeast Gulf 0
โฎ โ โฎ โฎ โฎ
15457 โ contig_2784 Cape Canaveral 0
15458 โ contig_2784 Georgia 2
15459 โ contig_2784 South Carolina 1
15460 โ contig_2784 Florida Keys 2
15461 โ contig_2784 Mideast Gulf 1
15462 โ contig_2784 Northeast Gulf 0
15463 โ contig_2784 Southeast Gulf 1
15449 rows omitted
Pairwise Identical Genotypesโ
While not a substitute for a kinship analysis, it may be useful to know or verify how similar your data are in a very literal sense:
how many identical genotypes do two individuals have across all loci? To do this, we use pairwiseidentical()
to perform an all x all comparison of identical genotypes. This can be done for all individuals in a PopData
object, or restricted to a specific set of individuals:
- all samples
- some samples
julia> cats = @nancycats;
julia> pairwiseidentical(cats)
27966ร4 DataFrame
Row โ sample_1 sample_2 identical n
โ String String Float64 Int64
โโโโโโโโผโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
1 โ N215 N216 0.5 8
2 โ N215 N217 0.25 8
3 โ N215 N218 0.38 8
4 โ N215 N219 0.38 8
โฎ โ โฎ โฎ โฎ โฎ
27963 โ N297 N290 0.29 7
27964 โ N281 N289 0.25 8
27965 โ N281 N290 0.43 7
27966 โ N289 N290 0.14 7
27958 rows omitted
julia> cats = @nancycats;
julia> interesting_cats = samplenames(cats)[1:5]
5-element Array{String,1}:
"N215"
"N216"
"N217"
"N218"
"N219"
julia> pairwiseidentical(cats, interesting_cats)
10ร4 DataFrame
Row โ sample_1 sample_2 identical n
โ String String Float64 Int64
โโโโโโผโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
1 โ N215 N216 0.5 8
2 โ N215 N217 0.25 8
3 โ N215 N218 0.38 8
4 โ N215 N219 0.38 8
5 โ N216 N217 0.12 8
6 โ N216 N218 0.25 8
7 โ N216 N219 0.38 8
8 โ N217 N218 0.0 9
9 โ N217 N219 0.11 9
10 โ N218 N219 0.33 9
Allelic Richnessโ
If you were curious about allelic richness (number of alleles per locus), then you can use richness()
to find that out. Use by = "population"
to return a table by locus by population.
- by locus
- by locusxpopulation
julia> cats = @nancycats;
julia> richness(cats)
9ร2 DataFrame
Row โ locus richness
โ String Int64
โโโโโโผโโโโโโโโโโโโโโโโโโ
1 โ fca8 16
2 โ fca23 11
3 โ fca43 10
4 โ fca45 9
5 โ fca77 12
6 โ fca78 8
7 โ fca90 12
8 โ fca96 12
9 โ fca37 18
julia> richness(cats, by = "population")
153ร3 DataFrame
Row โ locus population richness
โ String String Int64
โโโโโโผโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
1 โ fca8 1 4
2 โ fca8 2 6
3 โ fca8 3 7
4 โ fca8 4 10
โฎ โ โฎ โฎ โฎ
150 โ fca37 14 3
151 โ fca37 15 4
152 โ fca37 16 3
153 โ fca37 17 3
145 rows omitted
Average Number of Allelesโ
Similar to richness, if you wanted to know the average number of alleles per locus, use alleleavg()
. Use rounding = false
if you don't want the answer rounded to 4 decimal places.
julia> alleleavg(@nancycats)
(mean = 12.0, stdev = 0.2668)
julia> alleleavg(@nancycats, rounding = false)
(mean = 12.0, stdev = 0.2667968432263687)
Summary Statisticsโ
Population genetics famously includes all manner of coefficients with which to summarize data. Use summary()
to view FST, DST, HT, etc. (like Hierfstat::basic.stats
).
- global
- by locus
julia> summary(@nancycats)
1ร10 DataFrame
Row โ Het_obs HS HT DST HTโฒ DSTโฒ FST FSTโฒ FIS DEST
โ Float64 Float64 Float64 Float64 Float64 Float64 Float64 Float64 Float64 Float64
โโโโโโผโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
1 โ 0.6299 0.7083 0.7717 0.0634 0.7757 0.0674 0.0821 0.0869 0.1108 0.231
julia> summary(@nancycats, by = "locus")
9ร11 DataFrame
Row โ locus Het_obs HS HT DST HTโฒ DSTโฒ FST FSTโฒ FIS DEST
โ String Float64 Float64 Float64 Float64 Float64 Float64 Float64 Float64 Float64 Float64
โโโโโโผโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
1 โ fca8 0.667 0.779 0.8619 0.0829 0.8671 0.0881 0.0962 0.1016 0.1438 0.3987
2 โ fca23 0.6838 0.7439 0.7994 0.0555 0.8029 0.0589 0.0694 0.0734 0.0809 0.2302
3 โ fca43 0.6814 0.7442 0.7937 0.0495 0.7968 0.0526 0.0623 0.066 0.0844 0.2054
4 โ fca45 0.71 0.7085 0.7642 0.0557 0.7679 0.0594 0.0729 0.0774 -0.0021 0.2039
5 โ fca77 0.6295 0.7828 0.8659 0.0831 0.8711 0.0883 0.096 0.1014 0.1958 0.4067
6 โ fca78 0.5773 0.6339 0.6773 0.0434 0.6801 0.0462 0.0641 0.0679 0.0893 0.1261
7 โ fca90 0.6454 0.7408 0.8144 0.0736 0.819 0.0782 0.0904 0.0955 0.1287 0.3017
8 โ fca96 0.6259 0.6747 0.7657 0.091 0.7714 0.0967 0.1189 0.1254 0.0723 0.2973
9 โ fca37 0.4485 0.5671 0.6027 0.0356 0.6049 0.0379 0.0591 0.0626 0.2091 0.0874
The column names above use the unicode prime symbol โฒ
to better reflect the actual coefficient ("FST prime" etc.). To print that character, press \prime<TAB>
, which reads "backslash, the word 'prime', and the TAB button".