Skip to main content

DataExploration.jl

PopGen.jl/src/DataExplortation.jlโ€‹

๐Ÿ“ฆ not exported๐Ÿ”ต exported by PopGen.jl

๐Ÿ”ต allelefreqtableโ€‹

allelefreqtable(data::PopData; by::String = "global")

Return a table of the observed global (default) or population allele frequencies in a PopData object.

Example

julia> cats = @nancycats ;

julia> allelefreqtable(cats)
108ร—4 DataFrame
Row โ”‚ locus allele count frequency
โ”‚ String Int16? Int64 Float64
โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
1 โ”‚ fca8 135 105 0.241935
2 โ”‚ fca8 143 44 0.101382
3 โ”‚ fca8 133 33 0.0760369
4 โ”‚ fca8 137 83 0.191244
โ‹ฎ โ”‚ โ‹ฎ โ‹ฎ โ‹ฎ โ‹ฎ
105 โ”‚ fca37 226 2 0.00421941
106 โ”‚ fca37 216 7 0.0147679
107 โ”‚ fca37 224 2 0.00421941
108 โ”‚ fca37 204 6 0.0126582
100 rows omitted

julia> allelefreqtable(cats, by = "population")
839ร—5 DataFrame
Row โ”‚ locus population allele count frequency
โ”‚ String String Int16? Int64 Float64
โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
1 โ”‚ fca8 1 135 9 0.5625
2 โ”‚ fca8 1 143 4 0.25
3 โ”‚ fca8 1 133 2 0.125
4 โ”‚ fca8 1 137 1 0.0625
โ‹ฎ โ”‚ โ‹ฎ โ‹ฎ โ‹ฎ โ‹ฎ โ‹ฎ
836 โ”‚ fca37 16 210 5 0.208333
837 โ”‚ fca37 17 208 22 0.846154
838 โ”‚ fca37 17 182 3 0.115385
839 โ”‚ fca37 17 220 1 0.0384615
831 rows omitted

๐Ÿ”ต genofreqtableโ€‹

genofreqtable(data::PopData; by::String = "global")

Return a table of the observed global (default) or population genotype frequencies in a PopData object.

Example

julia> cats = @nancycats ;

julia> genofreqtable(cats)

341ร—4 DataFrame
Row โ”‚ locus genotype count frequency
โ”‚ String Tupleโ€ฆ Int64 Float64
โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
1 โ”‚ fca8 (135, 143) 16 0.0737327
2 โ”‚ fca8 (133, 135) 9 0.0414747
3 โ”‚ fca8 (135, 135) 23 0.105991
4 โ”‚ fca8 (137, 143) 8 0.0368664
โ‹ฎ โ”‚ โ‹ฎ โ‹ฎ โ‹ฎ โ‹ฎ
338 โ”‚ fca37 (206, 220) 1 0.00421941
339 โ”‚ fca37 (208, 218) 1 0.00421941
340 โ”‚ fca37 (184, 184) 3 0.0126582
341 โ”‚ fca37 (208, 210) 3 0.0126582
333 rows omitted

julia> genofreqtable(cats, by = "population")
1094ร—5 DataFrame
Row โ”‚ locus population genotype count frequency
โ”‚ String String Tupleโ€ฆ Int64 Float64
โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
1 โ”‚ fca8 1 (135, 143) 3 0.375
2 โ”‚ fca8 1 (133, 135) 2 0.25
3 โ”‚ fca8 1 (135, 135) 2 0.25
4 โ”‚ fca8 1 (137, 143) 1 0.125
โ‹ฎ โ”‚ โ‹ฎ โ‹ฎ โ‹ฎ โ‹ฎ โ‹ฎ
1091 โ”‚ fca37 17 (208, 208) 10 0.769231
1092 โ”‚ fca37 17 (182, 182) 1 0.0769231
1093 โ”‚ fca37 17 (182, 208) 1 0.0769231
1094 โ”‚ fca37 17 (208, 220) 1 0.0769231
1086 rows omitted

๐Ÿ”ต missingdataโ€‹

missingdata(data::PopData; by::Union{String, Symbol} = "sample")

Get missing genotype information in a PopData. Specify a mode of operation to return a DataFrame corresponding with that missing information.

Modes

  • "sample" - returns a count and list of missing loci per individual (default)
  • "population" - returns a count of missing genotypes per population
  • "locus" - returns a count of missing genotypes per locus
  • "locusxpopulation" - returns a count of missing genotypes per locus per population

Example

missingdata(@gulfsharks, by = "pop")

๐Ÿ“ฆ _missingdataโ€‹

_missingdata(data::PopData, ::Val{:sample})
_missingdata(data::PopData, ::Val{:population})
_missingdata(data::PopData, ::Val{:locus})
_missingdata(data::PopData, ::Val{:locusxpopulation})

๐Ÿ”ต pairwiseidenticalโ€‹

pairwiseidentical(data::PopData)

Return a pairwise matrix of the percent of identical genotypes at each locus between all pairs of individuals.

Example

julia> cats = @nancycats ;
julia> pairwiseidentical(cats)
237ร—237 Named Matrix{Float64}
A โ•ฒ B โ”‚ N215 N216 โ€ฆ N289 N290
โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
N215 โ”‚ 1.0 0.5 โ€ฆ 0.142857 0.166667
N216 โ”‚ 0.5 1.0 0.142857 0.166667
N217 โ”‚ 0.25 0.125 0.125 0.142857
N218 โ”‚ 0.375 0.25 0.25 0.142857
N219 โ”‚ 0.375 0.375 0.25 0.142857
โ‹ฎ โ‹ฎ โ‹ฎ โ‹ฑ โ‹ฎ โ‹ฎ
N296 โ”‚ 0.5 0.333333 0.0 0.0
N297 โ”‚ 0.166667 0.166667 0.428571 0.285714
N281 โ”‚ 0.142857 0.142857 0.25 0.428571
N289 โ”‚ 0.142857 0.142857 1.0 0.142857
N290 โ”‚ 0.166667 0.166667 โ€ฆ 0.142857 1.0
pairwiseidentical(data::PopData, sample_names::Vector{String})

Return a pairwise matrix of the percent of identical genotypes at each nonmissing locus between all pairs of provided sample_names.

Example

julia> cats = @nancycats ;
julia> interesting_cats = samplenames(cats)[1:5]
5-element Array{String,1}:
"N215"
"N216"
"N217"
"N218"
"N219"

julia> pairwiseidentical(cats, interesting_cats)
5ร—5 Named Matrix{Float64}
A โ•ฒ B โ”‚ N217 N218 N219 N220 N221
โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
N217 โ”‚ 1.0 0.0 0.111111 0.222222 0.111111
N218 โ”‚ 0.0 1.0 0.333333 0.111111 0.444444
N219 โ”‚ 0.111111 0.333333 1.0 0.111111 0.333333
N220 โ”‚ 0.222222 0.111111 0.111111 1.0 0.222222
N221 โ”‚ 0.111111 0.444444 0.333333 0.222222 1.0