Skip to main content

Structure

Words of wisdom

More often than not, your Structure file was created by a conversion from another format. While PopGen.jl offers a Structure file reader, we generally recommend using whatever previous format it was in because the Structure reader has more specific format requirements than the other readers, which can cause unneeded frustration. Additionally, fewer data conversions mean less chance of conversion errors occuring.

Import a Structure file as PopData

structure(infile::String; kwargs...)

Arguments

  • infile::String : path to Structure file

Keyword Arguments

  • extracols::Integer: how many additional optional columns there are beyond Stucture's POPDATA the reader needs to ignore (default: 0)
    • these include POPFLAG, LOCDATA, or anything else you might have added
  • extrarows::Integer : how many additional optional rows there are beyond the first row of locus names (default: 0)
  • missingval::String : the value used to identify missing values in the data (default: "-9")
  • silent::Bool : whether to print file information during import (default: false)
  • allow_monomorphic::Bool : whether to keep monomorphic loci in the dataset (default: false)
  • faststructure::Bool: whether the file is fastStructure format (default: false)

File formatting

Structure files are not an ideal format because there is a bit too much wiggle room in the specifications that are later cleaned up with a config file when running the software. As such, PopGen.jl requires somewhat more specificity in how the files are formatted for things to work correctly:

  • the file is tab or space delimited but not both
  • first row is locus names separated by the delimiter
    • leading/trailing whitespaces are tolerated
    • optional rows allowed after the locus names
  • number of rows per sample = ploidy
    • e.g. if diploid, that sample would have 2 rows
    • multi-column variant not supported
  • first data column is sample name
  • second data column is population ID
    • optional columns allowed after the population ID (2nd) column
  • remaining columns are the genotype for that individual for that locus

Structure file example:

locus_1 locus_2 locus_3 locus_4 locus_5
walnut_01 1 -9 145 66 0 92
walnut_01 1 -9 -9 64 0 94
walnut_02 1 106 142 68 1 92
walnut_02 1 106 148 64 0 94
walnut_03 2 110 145 -9 0 92
walnut_03 2 110 148 66 1 -9

Example

walnuts = structure("juglans_nigra.str", extracols = 0, extrarows = 0)

Writing to a Structure file

All file writing options can be performed using PopGen.write(), which calls structure when writing to a Structure/fastStructure file.

structure(data::PopData; filename::String, faststructure::Bool, delim::String)

Write a PopData object to a Stucture format file

Arguments

  • data: the PopData object you wish to convert to a Structure file

Keyword Arguments

  • filename::String: the output filename
  • delim::String : either "tab" or "space" indicating the delimiter (default: "tab")
  • faststructure::Bool: if the output should be formatted for fastStructure (default: false)

Example

cats = @nancycats;
fewer_cats = omit(cats, name = samplenames(cats)[1:10]);
structure(fewer_cats, filename = "filtered_nancycats.str", faststructure = true)