Skip to main content

Genepop

Import a genepop file as PopData

genepop(infile; kwargs...)

Arguments

  • infile::String : path to genepop file, in quotes

Keyword Arguments

  • digits::Integer: number of digits denoting each allele (default: 3)
  • popsep::String : word that separates populations in infile (default: "POP")
  • diploid::Bool : whether samples are diploid for parsing optimizations (default: true)
  • silent::Bool : whether to print file information during import (default: false)
population names

By default, the file reader will assign numbers as population ID's (as Strings) in order of appearance in the genepop file. Use the populations! function to rename these with your own population ID's.

Example

julia> wasp_data = genepop("/data/wasp_hive.gen", digits = 3, popsep = "POP")

Format

Files must follow standard Genepop formatting:

  • First line is a comment (and skipped)
  • Loci are listed after first line as one-per-line without commas or in single comma-separated row
  • A line with a particular and consistent keyword must delimit populations
    • must be the same word each time and not a unique population name
  • File is tab delimited or space delimited, but not both
Wasp populations in New York
Locus1
Locus2
Locus3
POP
Oneida_01, 250230 564568 110100
Oneida_02, 252238 568558 100120
Oneida_03, 254230 564558 090100
POP
Newcomb_01, 254230 564558 080100
Newcomb_02, 000230 564558 090080
Newcomb_03, 254230 000000 090100
Newcomb_04, 254230 564000 090120

Writing to a Genepop file

All file writing options can be performed using PopGen.write(), which calls genpop when writing to a Genepop file.

genepop(data::PopData; filename::String = "output.gen", digits::Int = 3, format::String = "vertical", miss::Int = 0)

Writes a PopData object to a Genepop-formatted file.

Arguments

  • data: the PopData object you wish to convert to a Genepop file

Keyword arguments

  • filename::String: the output filename
  • digits::Integer: how many digits to format each allele
    • e.g. digits = 3 will turn (1, 2) into 001002
  • format::String : the way loci should be formatted
    • vertically ("v" or "vertical")
    • hortizontally ("h", or "horizontal")
    • isolation-by-distance ("ibd") where each sample is a population with coordinate data prepended
  • miss::Integer : how you would like missing values written
    • 0 : as a genotype represented as a number of zeroes equal to digits × ploidy like 000000 (default)
    • -9 : as a single value -9

Example

cats = @nancycats;
fewer_cats = omit(cats, name = samplenames(cats)[1:10]);
julia> genepop(fewer_cats, filename = "filtered_nancycats.gen", digits = 3, format = "h")

Acknowledgements

The original implementations of the importing parser were written using only Base Julia, and while the speed was fantastic, the memory footprint involved seemed unusually high (~650mb RAM to parse gulfsharks, which is only 3.2mb in size). However, thanks to the efforts of CSV.jl, we leverage that package to preserve the speed and reduce the memory footprint quite a bit!