Viewing data
PopGen.jl includes commands to provide obvious methods to inspect and alter PopData
. Using standard Julia conventions, only commands ending with a bang !
are mutable, meaning they alter the input data. So, commands like populations
will show you population information, whereas populations!
will change that information in your PopData
. The mutable commands here alter the data in your PopData
, but not the source data (i.e. the files used to create the PopData
). The "manipulation" commands were separated into smaller sections to make it less overwhelming, and using the gulfsharks
data, you can explore each of the sections like a little tutorial. The sections don't follow any particular order, so feel free to jump around however you like.
TL;DR: End-users (vs developers) shouldn't access PopData fields directly and use the access functions instead
In earlier versions of PopGen.jl, you were encouraged to directly access the internal fields of PopData. After careful consideration and discussion with other users and developers, it's been decided that we should follow standard-ish convention and provide function wrappers to view PopData fields and discourage direct access (unless you're a developer). This decision is intended to limit unintentional errors, but also means a user has less to learn to get started.
A little hands-on training will probably go a long way, so let's through some of the functions available in PopGen.jl with the included data. This tutorial will include both inputs and outputs so you can be confident what you're seeing in your Julia session is exactly what's supposed to happen. Sometimes the outputs can be a little lengthy, so they will be arranged in code "tabs".
There are specific relationships between the record entries in PopData
objects, so do not use sort
, sort!
, or manually arrange/add/delete anything in PopData. There are included functions to remove samples or loci, rename things, add location data, etc.
Loading in the dataโ
Let's keep things simple by loading in the nancycats data and calling it ncats
.
julia> ncats = @nancycats
PopData{Diploid, 9 Microsatellite loci}
Samples: 237
Populations: 17
Now that we have nancycats loaded in, we can use standard Julia accessor conventions to view the elements within our PopData. The DataFrames uses the convention dataframe.colname
to directly access the columns we want.
The metadata (data about the data)โ
Some critical information about the data is front-loaded into a PopData object to eliminate constantly getting these values in calculations.
To view this information, use metadata()
.
julia> metadata(ncats)
ploidy: 2
loci: 9
samples: 237
populations: 17
biallelic: false
Included in metadata
are two DataFrames, one for sample information, and another for locus information.
- sample information
- locus information
sampleinfoโ
To view the sample information, you can use sampleinfo()
julia> sampleinfo(ncats)
237ร3 DataFrame
Row โ name population ploidy
โ String7โฆ String Int8
โโโโโโผโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
1 โ N217 1 2
2 โ N218 1 2
3 โ N219 1 2
4 โ N220 1 2
5 โ N221 1 2
6 โ N222 1 2
โฎ โ โฎ โฎ โฎ
232 โ N197 14 2
233 โ N198 14 2
234 โ N199 14 2
235 โ N200 14 2
236 โ N201 14 2
237 โ N206 14 2
222 rows omitted
Using the standard DataFrames getindex
methods, we can access these columns like so:
julia> sinfo = sampleinfo(ncats) ;
julia> sinfo.name
237-element Array{String,1}:
"N1"
"N2"
"N3"
"N4"
"N5"
"N6"
"N7"
"N8"
โฎ
"N230"
"N231"
"N232"
"N233"
"N234"
"N235"
"N236"
"N237"
locusinfoโ
To view the locus information, you can use locusinfo()
. Locus information is not mandatory,
but present if needed for future analyses.
julia> locusinfo(ncats)
9ร4 DataFrame
Row โ chromosome locus cm bp
โ Int8 String Float64 Int64
โโโโโโผโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
1 โ 0 fca8 0 0
2 โ 0 fca23 0 0
3 โ 0 fca43 0 0
4 โ 0 fca45 0 0
5 โ 0 fca77 0 0
6 โ 0 fca78 0 0
7 โ 0 fca90 0 0
8 โ 0 fca96 0 0
9 โ 0 fca37 0 0
The genotype tableโ
genodataโ
You can view the genotype information with genodata()
.
julia> genodata(ncats)
2133ร4 DataFrame
Row โ name population locus genotype
โ String String String Tupleโฆ?
โโโโโโโผโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
1 โ N215 1 fca8 missing
2 โ N216 1 fca8 missing
3 โ N217 1 fca8 (135, 143)
4 โ N218 1 fca8 (133, 135)
5 โ N219 1 fca8 (133, 135)
6 โ N220 1 fca8 (135, 143)
โฎ โ โฎ โฎ โฎ โฎ
2128 โ N295 17 fca37 (208, 208)
2129 โ N296 17 fca37 (208, 220)
2130 โ N297 17 fca37 (208, 208)
2131 โ N281 17 fca37 (208, 208)
2132 โ N289 17 fca37 (208, 208)
2133 โ N290 17 fca37 (208, 208)
2121 rows omitted
Because the genotype data is in long format (aka "tidy"), accessing genotypes in a meaningful way is fairly
straightforward if you have any experience with dataframe manipulation. For a deeper look into indexing PopData
,
read Advanced PopData Indexing
The functions here help you inspect your PopData
and pull information from it easily.
View specific informationโ
sample namesโ
samplenames(data::PopData)
View individual/sample names in a PopData
.
julia> samplenames(sharks)
212-element Array{String,1}:
"cc_001"
"cc_002"
"cc_003"
"cc_005"
"cc_007"
โฎ
"seg_027"
"seg_028"
"seg_029"
"seg_030"
"seg_031"
locus namesโ
loci(data::PopData)
Returns a vector of strings of the loci names in a PopData
julia> loci(sharks)
2213-element Array{String,1}:
"contig_35208"
"contig_23109"
"contig_4493"
"contig_10742"
"contig_14898"
โฎ
"contig_43517"
"contig_27356"
"contig_475"
"contig_19384"
"contig_22368"
"contig_2784"
View genotypesโ
all genotypes in one locus or sampleโ
genotypes(data::PopData, samplelocus::String)
Returns a vector (view) of genotypes for a locus, or sample, depending on which the function finds in your data. Don't worry too much about the wild type signature of the return vector.
julia> genotypes(sharks, "contig_2784")
212-element view(::PooledArrays.PooledVector{Union{Missing, Tuple{Int8, Int8}}, UInt8, Vector{UInt8}}, [468097, 468098, 468099, 468100, 468101, 468102, 468103, 468104, 468105, 468106 โฆ 468299, 468300, 468301, 468302, 468303, 468304, 468305, 468306, 468307, 468308]) with eltype Union{Missing, Tuple{Int8, Int8}}:
(1, 1)
(1, 1)
(1, 1)
โฎ
(1, 1)
(1, 1)
(1, 1)
julia> genotypes(sharks, "cc_001")
2209-element view(::PooledArrays.PooledVector{Union{Missing, Tuple{Int8, Int8}}, UInt8, Vector{UInt8}}, [1, 213, 425, 637, 849, 1061, 1273, 1485, 1697, 1909 โฆ 466189, 466401, 466613, 466825, 467037, 467249, 467461, 467673, 467885, 468097]) with eltype Union{Missing, Tuple{Int8, Int8}}:
(1, 2)
(1, 1)
(1, 2)
โฎ
(2, 2)
(1, 1)
(1, 1)
one sample, one locusโ
genotype(data::PopData, sample::String => locus::String)
Returns the genotype of the sample
at the locus
. Uses Pair
notation.
julia> genotype(sharks, "cc_001" => "contig_2784")
(1, 1)
many samples, one locusโ
genotype(data::PopData, samples::Vector{String} => loci::String)
Returns a subdataframe of the genotypes of the samples
at the locus
. Uses Pair
notation.
julia> genotypes(sharks, samplenames(sharks)[1:3] => "contig_2784")
3ร4 SubDataFrame
Row โ name population locus genotype
โ String7 String String Tupleโฆ?
โโโโโโผโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
1 โ cc_001 CapeCanaveral contig_2784 (1, 1)
2 โ cc_002 CapeCanaveral contig_2784 (1, 1)
3 โ cc_003 CapeCanaveral contig_2784 (1, 1)
one sample, many lociโ
genotype(data::PopData, sample::String => loci::Vector{String})
Returns a subdataframe of the genotypes of the sample
at the loci
. Uses Pair
notation.
julia> genotypes(sharks, "cc_001" => loci(sharks)[1:3])
3ร4 SubDataFrame
Row โ name population locus genotype
โ String7 String String Tupleโฆ?
โโโโโโผโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
1 โ cc_001 CapeCanaveral contig_35208 (1, 2)
2 โ cc_001 CapeCanaveral contig_23109 (1, 1)
3 โ cc_001 CapeCanaveral contig_4493 (1, 2)
many samples, many lociโ
genotype(data::PopData, samples::Vector{String} => loci::Vector{String})
Returns a subdataframe of the genotypes of the samples
at the loci
. Uses Pair
notation.
julia> genotypes(sharks, samplenames(sharks)[1:3] => loci(sharks)[1:3])
9ร4 SubDataFrame
Row โ name population locus genotype
โ String7 String String Tupleโฆ?
โโโโโโผโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
1 โ cc_001 CapeCanaveral contig_35208 (1, 2)
2 โ cc_002 CapeCanaveral contig_35208 (1, 2)
3 โ cc_003 CapeCanaveral contig_35208 (1, 1)
4 โ cc_001 CapeCanaveral contig_23109 (1, 1)
5 โ cc_002 CapeCanaveral contig_23109 (1, 2)
6 โ cc_003 CapeCanaveral contig_23109 missing
7 โ cc_001 CapeCanaveral contig_4493 (1, 2)
8 โ cc_002 CapeCanaveral contig_4493 (1, 1)
9 โ cc_003 CapeCanaveral contig_4493 (1, 1)