Structure.jl

PopGenCore.jl/src/io/Structure.jl

📦 not exported	🟪 exported by PopGenCore.jl	🔵 exported by PopGen.jl

📦 phase_structure

phase_structure(datatype::DataType, args...)

Takes a DataType (such as Int8) and a series of integers to return a sorted Tuple of those integers converted to that DataType. i.e. takes a series of alleles and returns a genotype. Returns missing if args are missing. Used internally in PopGen.structure file reader.

Example

phase_structure(Int8, 1,2,3,4,3,4,6,1)
(1, 1, 2, 3, 3, 4, 4, 6)

phase_structure(Int16, missing, missing)
missing

🟪 structure

structure(infile::String; kwargs...)

Load a Structure format file into memory as a PopData object.

infile::String : path to Structure file

Keyword Arguments

extracols::Integer: how many additional optional columns there are beyond Stucture's POPDATA the reader needs to ignore (default: 0)
- these include POPFLAG, LOCDATA, or anything else you might have added
extrarows::Integer : how many additional optional rows there are beyond the first row of locus names (default: 0)
missingval::String : the value used to identify missing values in the data (default: "-9")
silent::Bool : whether to print file information during import (default: false)
allow_monomorphic::Bool : whether to keep monomorphic loci in the dataset (default: false)
faststructure::Bool: whether the file is fastStructure format (default: false)

File must follow this Structure format:

the file is tab or space delimited but not both
first row is locus names separated by the delimiter
- leading/trailing whitespaces are tolerated
- optional rows allowed after the locus names
number of rows per sample = ploidy
- e.g. if diploid, that sample would have 2 rows
- multi-column variant not supported
first data column is sample name
second data column is population ID
- optional columns allowed after the population ID (2nd) column
remaining columns are the genotype for that individual for that locus

Structure file example:

locus_1 locus_2 locus_3 locus_4 locus_5
walnut_01   1   -9  145 66  0   92
walnut_01   1   -9  -9  64  0   94
walnut_02   1   106 142 68  1   92
walnut_02   1   106 148 64  0   94
walnut_03   2   110 145 -9  0   92
walnut_03   2   110 148 66  1   -9

fastStructure file format:

the file is tab or space delimited but not both
no first row of loci names
number of rows per sample = ploidy
- e.g. if diploid, that sample would have 2 rows
first data column is sample name
second data column is population ID
remaining columns are the genotype for that individual for that locus
usually, first 6 colums are empty (but not necessary)
no extra rows or columns.

fastStructure file example:

chestnut_01 1   -9  145 66  0   92
chestnut_01 1   -9  -9  64  0   94
chestnut_02 1   106 142 68  1   92
chestnut_02 1   106 148 64  0   94
chestnut_03 2   110 145 -9  0   92
chestnut_03 2   110 148 66  1   -9

Example

walnuts = structure("juglans_nigra.str", extracols = 0, extrarows = 0)

structure(data::PopData; filename::String, faststructure::Bool, delim::String)

Write a PopData object to a Stucture format file

data: the PopData object you wish to convert to a Structure file

keyword arguments

filename: a String of the output filename
delim : a String of either "tab" or "space" indicating the delimiter (default: "tab")
faststructure: true/false of whether the output should be formatted for fastStructure (default: false)

Example

cats = @nancycats;
fewer_cats = omit(cats, name = samplenames(cats)[1:10]);
structure(fewer_cats, filename = "filtered_nancycats.str", faststructure = true)

PopGenCore.jl/src/io/Structure.jl​

📦 phase_structure​

Example​

🟪 structure​

Keyword Arguments​

File must follow this Structure format:​

Structure file example:​

fastStructure file format:​

fastStructure file example:​

keyword arguments​