Structure.jl
PopGenCore.jl/src/io/Structure.jl
| 📦 not exported | 🟪 exported by PopGenCore.jl | 🔵 exported by PopGen.jl |
|---|
📦 phase_structure
phase_structure(datatype::DataType, args...)
Takes a DataType (such as Int8) and a series of integers to return
a sorted Tuple of those integers converted to that DataType. i.e. takes
a series of alleles and returns a genotype. Returns missing if args are
missing. Used internally in PopGen.structure file reader.
Example
phase_structure(Int8, 1,2,3,4,3,4,6,1)
(1, 1, 2, 3, 3, 4, 4, 6)
phase_structure(Int16, missing, missing)
missing
🟪 structure
structure(infile::String; kwargs...)
Load a Structure format file into memory as a PopData object.
infile::String: path to Structure file
Keyword Arguments
extracols::Integer: how many additional optional columns there are beyond Stucture's POPDATA the reader needs to ignore (default:0)- these include POPFLAG, LOCDATA, or anything else you might have added
extrarows::Integer: how many additional optional rows there are beyond the first row of locus names (default:0)missingval::String: the value used to identify missing values in the data (default:"-9")silent::Bool: whether to print file information during import (default:false)allow_monomorphic::Bool: whether to keep monomorphic loci in the dataset (default:false)faststructure::Bool: whether the file is fastStructure format (default:false)
File must follow this Structure format:
- the file is
taborspacedelimited but not both - first row is locus names separated by the delimiter
- leading/trailing whitespaces are tolerated
- optional rows allowed after the locus names
- number of rows per sample = ploidy
- e.g. if diploid, that sample would have 2 rows
- multi-column variant not supported
- first data column is sample name
- second data column is population ID
- optional columns allowed after the population ID (2nd) column
- remaining columns are the genotype for that individual for that locus
Structure file example:
locus_1 locus_2 locus_3 locus_4 locus_5
walnut_01 1 -9 145 66 0 92
walnut_01 1 -9 -9 64 0 94
walnut_02 1 106 142 68 1 92
walnut_02 1 106 148 64 0 94
walnut_03 2 110 145 -9 0 92
walnut_03 2 110 148 66 1 -9
fastStructure file format:
- the file is
taborspacedelimited but not both - no first row of loci names
- number of rows per sample = ploidy
- e.g. if diploid, that sample would have 2 rows
- first data column is sample name
- second data column is population ID
- remaining columns are the genotype for that individual for that locus
- usually, first 6 colums are empty (but not necessary)
- no extra rows or columns.
fastStructure file example:
chestnut_01 1 -9 145 66 0 92
chestnut_01 1 -9 -9 64 0 94
chestnut_02 1 106 142 68 1 92
chestnut_02 1 106 148 64 0 94
chestnut_03 2 110 145 -9 0 92
chestnut_03 2 110 148 66 1 -9
Example
walnuts = structure("juglans_nigra.str", extracols = 0, extrarows = 0)
structure(data::PopData; filename::String, faststructure::Bool, delim::String)
Write a PopData object to a Stucture format file
data: thePopDataobject you wish to convert to a Structure file
keyword arguments
filename: aStringof the output filenamedelim: aStringof either"tab"or"space"indicating the delimiter (default:"tab")faststructure: true/false of whether the output should be formatted for fastStructure (default:false)
Example
cats = @nancycats;
fewer_cats = omit(cats, name = samplenames(cats)[1:10]);
structure(fewer_cats, filename = "filtered_nancycats.str", faststructure = true)