Data Exploration
Allele frequency tableโ
allelefreqtable(data::PopData; by::Union{String, Symbol} = "global")
Return a table of the observed global (default) or population allele frequencies in a PopData object. Use this if you want to see what the frequencies are for every allele at every locus.
- global
- population
julia> cats = @nancycats ;
julia> allelefreqtable(cats)
108ร4 DataFrame
 Row โ locus   allele  count  frequency  
     โ String  Int16?  Int64  Float64    
โโโโโโผโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
   1 โ fca8       135    105  0.241935
   2 โ fca8       143     44  0.101382
   3 โ fca8       133     33  0.0760369
   4 โ fca8       137     83  0.191244
  โฎ  โ   โฎ       โฎ       โฎ        โฎ
 105 โ fca37      226      2  0.00421941
 106 โ fca37      216      7  0.0147679
 107 โ fca37      224      2  0.00421941
 108 โ fca37      204      6  0.0126582
                         100 rows omitted
julia> cats = @nancycats ;
julia> allelefreqtable(cats, by = "population")
839ร5 DataFrame
 Row โ locus   population  allele  count  frequency 
     โ String  String      Int16?  Int64  Float64   
โโโโโโผโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
   1 โ fca8    1              135      9  0.5625
   2 โ fca8    1              143      4  0.25
   3 โ fca8    1              133      2  0.125
   4 โ fca8    1              137      1  0.0625
  โฎ  โ   โฎ         โฎ         โฎ       โฎ        โฎ
 836 โ fca37   16             210      5  0.208333
 837 โ fca37   17             208     22  0.846154
 838 โ fca37   17             182      3  0.115385
 839 โ fca37   17             220      1  0.0384615
                                    831 rows omitted
Genotype frequency tableโ
genofreqtable(data::PopData; by::Union{String, Symbol} = "global")
Return a table of the observed global (default) or population genotype frequencies in a PopData object. Use this if you want to see what the frequencies are for every genotype at every locus.
- global
- population
julia> cats = @nancycats ;
julia> genofreqtable(cats)
341ร4 DataFrame
 Row โ locus   genotype    count  frequency  
     โ String  Tupleโฆ      Int64  Float64    
โโโโโโผโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
   1 โ fca8    (135, 143)     16  0.0737327
   2 โ fca8    (133, 135)      9  0.0414747
   3 โ fca8    (135, 135)     23  0.105991
   4 โ fca8    (137, 143)      8  0.0368664
  โฎ  โ   โฎ         โฎ         โฎ        โฎ
 338 โ fca37   (206, 220)      1  0.00421941
 339 โ fca37   (208, 218)      1  0.00421941
 340 โ fca37   (184, 184)      3  0.0126582
 341 โ fca37   (208, 210)      3  0.0126582
                             333 rows omitted
julia> cats = @nancycats ;
julia> genofreqtable(cats, by = "population")
1094ร5 DataFrame
  Row โ locus   population  genotype    count  frequency         
      โ String  String      Tupleโฆ      Int64  Float64           
โโโโโโโผโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ        
    1 โ fca8    1           (135, 143)      3  0.375
    2 โ fca8    1           (133, 135)      2  0.25
    3 โ fca8    1           (135, 135)      2  0.25
    4 โ fca8    1           (137, 143)      1  0.125
  โฎ   โ   โฎ         โฎ           โฎ         โฎ        โฎ
 1091 โ fca37   17          (208, 208)     10  0.769231
 1092 โ fca37   17          (182, 182)      1  0.0769231
 1093 โ fca37   17          (182, 208)      1  0.0769231
 1094 โ fca37   17          (208, 220)      1  0.0769231
                                        1086 rows omitted 
Missing Dataโ
missingdata(data::PopData; by::Union{String, Symbol} = "sample")
Get missing genotype information in a PopData object. Specify a mode of operation using the by = keyword to return a table corresponding with that missing information type.
| by | what it does | 
|---|---|
| "sample" | returns a count of missing loci per individual (default) | 
| "population" | returns a count of missing genotypes per population | 
| "locus" | returns a count of missing genotypes per locus | 
| "locusxpopulation" | returns a count of missing genotypes per locus per population | 
- sample
- population
- locus
- locusxpopulation
julia> sharks = @gulfsharks ;
julia> missingdata(sharks)
212ร2 DataFrame
 Row โ name     missing
     โ String   Int64
โโโโโโผโโโโโโโโโโโโโโโโโโ
   1 โ cc_001       124
   2 โ cc_002        94
   3 โ cc_003       100
   4 โ cc_005         0
   5 โ cc_007         2
   6 โ cc_008         1
   7 โ cc_009         2
  โฎ  โ    โฎ        โฎ
 206 โ seg_025        0
 207 โ seg_026        0
 208 โ seg_027        2
 209 โ seg_028       25
 210 โ seg_029        0
 211 โ seg_030        1
 212 โ seg_031        1
        198 rows omitted
julia> sharks = @gulfsharks ;
julia> missingdata(sharks, by = "population")
7ร2 DataFrame
 Row โ population      missing
     โ String          Int64
โโโโโโผโโโโโโโโโโโโโโโโโโโโโโโโโ
   1 โ Cape Canaveral      666
   2 โ Georgia             423
   3 โ South Carolina      233
   4 โ Florida Keys       1241
   5 โ Mideast Gulf         99
   6 โ Northeast Gulf      472
   7 โ Southeast Gulf     1504
julia> sharks = @gulfsharks ;
julia> missingdata(sharks, by = "locus")
2209ร2 DataFrame
  Row โ locus         missing
      โ String        Int64
โโโโโโโผโโโโโโโโโโโโโโโโโโโโโโโ
    1 โ contig_35208        0
    2 โ contig_23109        6
    3 โ contig_4493         3
    4 โ contig_10742        2
    5 โ contig_14898        0
    6 โ contig_8483         0
    7 โ contig_8065         0
  โฎ   โ      โฎ           โฎ
 2203 โ contig_18959        0
 2204 โ contig_43517        6
 2205 โ contig_27356        2
 2206 โ contig_475          0
 2207 โ contig_19384        5
 2208 โ contig_22368        3
 2209 โ contig_2784         7
             2195 rows omitted
julia> sharks = @gulfsharks ;
julia> missingdata(sharks, by = "locusxpopulation")
15463ร3 DataFrame
   Row โ locus         population      missing
       โ String        String          Int64
โโโโโโโโผโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
     1 โ contig_35208  Cape Canaveral        0
     2 โ contig_35208  Georgia               0
     3 โ contig_35208  South Carolina        0
     4 โ contig_35208  Florida Keys          0
     5 โ contig_35208  Mideast Gulf          0
     6 โ contig_35208  Northeast Gulf        0
     7 โ contig_35208  Southeast Gulf        0
   โฎ   โ      โฎ              โฎ            โฎ
 15457 โ contig_2784   Cape Canaveral        0
 15458 โ contig_2784   Georgia               2
 15459 โ contig_2784   South Carolina        1
 15460 โ contig_2784   Florida Keys          2
 15461 โ contig_2784   Mideast Gulf          1
 15462 โ contig_2784   Northeast Gulf        0
 15463 โ contig_2784   Southeast Gulf        1
                             15449 rows omitted
Pairwise Identical Genotypesโ
While not a substitute for a kinship analysis, it may be useful to know or verify how similar your data are in a very literal sense:
how many identical genotypes do two individuals have across all loci? To do this, we use pairwiseidentical() to perform an all x all comparison of identical genotypes. This can be done for all individuals in a PopData object, or restricted to a specific set of individuals:
- all samples
- some samples
julia> cats = @nancycats;
julia> pairwiseidentical(cats)
27966ร4 DataFrame
   Row โ sample_1  sample_2  identical  n     
       โ String    String    Float64    Int64 
โโโโโโโโผโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
     1 โ N215      N216           0.5       8
     2 โ N215      N217           0.25      8
     3 โ N215      N218           0.38      8
     4 โ N215      N219           0.38      8
   โฎ   โ    โฎ         โฎ          โฎ        โฎ
 27963 โ N297      N290           0.29      7
 27964 โ N281      N289           0.25      8
 27965 โ N281      N290           0.43      7
 27966 โ N289      N290           0.14      7
                            27958 rows omitted
julia> cats = @nancycats;
julia> interesting_cats = samplenames(cats)[1:5]
5-element Array{String,1}:
 "N215"
 "N216"
 "N217"
 "N218"
 "N219"
julia> pairwiseidentical(cats, interesting_cats)
10ร4 DataFrame
 Row โ sample_1  sample_2  identical  n     
     โ String    String    Float64    Int64 
โโโโโโผโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
   1 โ N215      N216           0.5       8 
   2 โ N215      N217           0.25      8 
   3 โ N215      N218           0.38      8 
   4 โ N215      N219           0.38      8 
   5 โ N216      N217           0.12      8 
   6 โ N216      N218           0.25      8 
   7 โ N216      N219           0.38      8 
   8 โ N217      N218           0.0       9 
   9 โ N217      N219           0.11      9 
  10 โ N218      N219           0.33      9 
Allelic Richnessโ
If you were curious about allelic richness (number of alleles per locus), then you can use richness() to find that out. Use by = "population" to return a table by locus by population.
- by locus
- by locusxpopulation
julia> cats = @nancycats;
julia> richness(cats)
9ร2 DataFrame
 Row โ locus   richness 
     โ String  Int64    
โโโโโโผโโโโโโโโโโโโโโโโโโ
   1 โ fca8          16
   2 โ fca23         11
   3 โ fca43         10
   4 โ fca45          9
   5 โ fca77         12
   6 โ fca78          8
   7 โ fca90         12
   8 โ fca96         12
   9 โ fca37         18
julia> richness(cats, by = "population")
153ร3 DataFrame
 Row โ locus   population  richness 
     โ String  String      Int64    
โโโโโโผโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
   1 โ fca8    1                  4
   2 โ fca8    2                  6
   3 โ fca8    3                  7
   4 โ fca8    4                 10
  โฎ  โ   โฎ         โฎ          โฎ
 150 โ fca37   14                 3
 151 โ fca37   15                 4
 152 โ fca37   16                 3
 153 โ fca37   17                 3
                    145 rows omitted
Average Number of Allelesโ
Similar to richness, if you wanted to know the average number of alleles per locus, use alleleavg(). Use rounding = false if you don't want the answer rounded to 4 decimal places.
julia> alleleavg(@nancycats)
(mean = 12.0, stdev = 0.2668)
julia> alleleavg(@nancycats, rounding = false)
(mean = 12.0, stdev = 0.2667968432263687)
Summary Statisticsโ
Population genetics famously includes all manner of coefficients with which to summarize data. Use summary() to view FST, DST, HT, etc. (like Hierfstat::basic.stats). 
- global
- by locus
julia> summary(@nancycats)
1ร10 DataFrame
 Row โ Het_obs  HS       HT       DST      HTโฒ      DSTโฒ     FST      FSTโฒ     FIS      DEST
     โ Float64  Float64  Float64  Float64  Float64  Float64  Float64  Float64  Float64  Float64
โโโโโโผโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
   1 โ  0.6299   0.7083   0.7717   0.0634   0.7757   0.0674   0.0821   0.0869   0.1108    0.231
julia> summary(@nancycats, by = "locus")
9ร11 DataFrame
 Row โ locus   Het_obs  HS       HT       DST      HTโฒ      DSTโฒ     FST      FSTโฒ     FIS      DEST
     โ String  Float64  Float64  Float64  Float64  Float64  Float64  Float64  Float64  Float64  Float64
โโโโโโผโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
   1 โ fca8     0.667    0.779    0.8619   0.0829   0.8671   0.0881   0.0962   0.1016   0.1438   0.3987
   2 โ fca23    0.6838   0.7439   0.7994   0.0555   0.8029   0.0589   0.0694   0.0734   0.0809   0.2302
   3 โ fca43    0.6814   0.7442   0.7937   0.0495   0.7968   0.0526   0.0623   0.066    0.0844   0.2054
   4 โ fca45    0.71     0.7085   0.7642   0.0557   0.7679   0.0594   0.0729   0.0774  -0.0021   0.2039
   5 โ fca77    0.6295   0.7828   0.8659   0.0831   0.8711   0.0883   0.096    0.1014   0.1958   0.4067
   6 โ fca78    0.5773   0.6339   0.6773   0.0434   0.6801   0.0462   0.0641   0.0679   0.0893   0.1261
   7 โ fca90    0.6454   0.7408   0.8144   0.0736   0.819    0.0782   0.0904   0.0955   0.1287   0.3017
   8 โ fca96    0.6259   0.6747   0.7657   0.091    0.7714   0.0967   0.1189   0.1254   0.0723   0.2973
   9 โ fca37    0.4485   0.5671   0.6027   0.0356   0.6049   0.0379   0.0591   0.0626   0.2091   0.0874
The column names above use the unicode prime symbol โฒ to better reflect the actual coefficient ("FST prime" etc.). To print that character, press \prime<TAB>, which reads "backslash, the word 'prime', and the TAB button".