Structure.jl
PopGenCore.jl/src/io/Structure.jl
📦 not exported | 🟪 exported by PopGenCore.jl | 🔵 exported by PopGen.jl |
---|
📦 phase_structure
phase_structure(datatype::DataType, args...)
Takes a DataType (such as Int8
) and a series of integers to return
a sorted Tuple of those integers converted to that DataType. i.e. takes
a series of alleles and returns a genotype. Returns missing
if args are
missing
. Used internally in PopGen.structure file reader.
Example
phase_structure(Int8, 1,2,3,4,3,4,6,1)
(1, 1, 2, 3, 3, 4, 4, 6)
phase_structure(Int16, missing, missing)
missing
🟪 structure
structure(infile::String; kwargs...)
Load a Structure format file into memory as a PopData object.
infile::String
: path to Structure file
Keyword Arguments
extracols::Integer
: how many additional optional columns there are beyond Stucture's POPDATA the reader needs to ignore (default:0
)- these include POPFLAG, LOCDATA, or anything else you might have added
extrarows::Integer
: how many additional optional rows there are beyond the first row of locus names (default:0
)missingval::String
: the value used to identify missing values in the data (default:"-9"
)silent::Bool
: whether to print file information during import (default:false
)allow_monomorphic::Bool
: whether to keep monomorphic loci in the dataset (default:false
)faststructure::Bool
: whether the file is fastStructure format (default:false
)
File must follow this Structure format:
- the file is
tab
orspace
delimited but not both - first row is locus names separated by the delimiter
- leading/trailing whitespaces are tolerated
- optional rows allowed after the locus names
- number of rows per sample = ploidy
- e.g. if diploid, that sample would have 2 rows
- multi-column variant not supported
- first data column is sample name
- second data column is population ID
- optional columns allowed after the population ID (2nd) column
- remaining columns are the genotype for that individual for that locus
Structure file example:
locus_1 locus_2 locus_3 locus_4 locus_5
walnut_01 1 -9 145 66 0 92
walnut_01 1 -9 -9 64 0 94
walnut_02 1 106 142 68 1 92
walnut_02 1 106 148 64 0 94
walnut_03 2 110 145 -9 0 92
walnut_03 2 110 148 66 1 -9
fastStructure file format:
- the file is
tab
orspace
delimited but not both - no first row of loci names
- number of rows per sample = ploidy
- e.g. if diploid, that sample would have 2 rows
- first data column is sample name
- second data column is population ID
- remaining columns are the genotype for that individual for that locus
- usually, first 6 colums are empty (but not necessary)
- no extra rows or columns.
fastStructure file example:
chestnut_01 1 -9 145 66 0 92
chestnut_01 1 -9 -9 64 0 94
chestnut_02 1 106 142 68 1 92
chestnut_02 1 106 148 64 0 94
chestnut_03 2 110 145 -9 0 92
chestnut_03 2 110 148 66 1 -9
Example
walnuts = structure("juglans_nigra.str", extracols = 0, extrarows = 0)
structure(data::PopData; filename::String, faststructure::Bool, delim::String)
Write a PopData
object to a Stucture format file
data
: thePopData
object you wish to convert to a Structure file
keyword arguments
filename
: aString
of the output filenamedelim
: aString
of either"tab"
or"space"
indicating the delimiter (default:"tab"
)faststructure
: true/false of whether the output should be formatted for fastStructure (default:false
)
Example
cats = @nancycats;
fewer_cats = omit(cats, name = samplenames(cats)[1:10]);
structure(fewer_cats, filename = "filtered_nancycats.str", faststructure = true)