Skip to main content

Pairwise F-Statistics

Background

Originating with Wright's FF fixation index (inbreeding coefficient), FF has since evolved into a whole slew of statistics used to describe the differentiation/divergence within or between groups. As you may have seen from summary(), there are a common group of these indices (FISF_{IS}, FITF_{IT}, and FSTF_{ST}) that compare the FF at various hierarchical levels. The notation is pretty straightforward: II is for individuals, TT is the total, and SS is for subpopulations.

F-statisticComparesAgainst
FISF_{IS}IndividualSubpopulation
FITF_{IT}IndividualTotal
FSTF_{ST}SubpopulationTotal

Often, we are interested in pairwise FSTF_{ST}, which is a type of coefficient that helps us infer how panmictic (fully mixed) two groups of interest are. A colloquial way of phrasing that is "how much genetic mixing is there between these two groups?". The value of FSTF_{ST} (and its derivatives) should typically range between 0 and 1 and can be interpreted like so:

FSTF_{ST} valueInterpretation
0the two groups are completely panmictic
1the two groups are completely isolated

However, it's not a linear relationship, and Wright considered 0.125 as the cutoff for when to determine populations as divergent.

FSTF_{ST} isn't everything

An important caveat to always consider is that FSTF_{ST} is just one tool to help us understand trends and not the entire picture. The genetic data we collect is just a snapshot in current time and populations can be completely isolated but still have near-zero FSTF_{ST} values for a number of reasons (slow divergence time, recent introgression, etc.). Significance testing helps add context to observed FSTF_{ST} values.

Pairwise FST

pairwisefst(data::PopData; method::Function, by::String = "global", iterations::Int64)

Calculate pairwise FSTF_{ST} between populations in a PopData object. Set iterations to a value greater than 0 to perform a single-tailed permutation test to obtain P-values of statistical significance. Use by = "locus" to perform a locus-by-locus FST for population pairs (iterations and significance testing ignored). `WeirCockerham is not yet implmented for by-locus FSTF_{ST}.

custom output type

The returned object for is a custom PairwiseFST type with the fields results (stores the dataframe of FSTF_{ST} values) and method (a string of which method was used to calculate it). This was done to define a custom show method to make the results a little nicer, and so you never lose track of which method was used for the calculation. If you want to access the dataframe directly, you will need to do so with varname.results where varname is whatever you named the output.

Arguments

  • data::PopData: a PopData object you wish to perform the calculation on

Keyword Arguments

  • method::Function: which FSTF_{ST} calculation method you would like to use
  • by::String: perfrom a "global" pairwise FSTF_{ST} or "locus" for locus-by-locus (ignores significance testing)
  • iterations::Int64: the number of iterations for signficance testing (default: 0)

Examples

julia> sharks = @gulfsharks ;

julia> pairwisefst(sharks, method = WeirCockerham)
Pairwise FST: WeirCockerham
CapeCanaveral Georgia SouthCarolina FloridaKeys MideastGulf NortheastGulf SoutheastGulf
───────────────────────────────────────────────────────────────────────────────────────────────────────────────
CapeCanaveral 0.0 0.0 0.0 0.0 0.0 0.0 0.0
Georgia 0.00081 0.0 0.0 0.0 0.0 0.0 0.0
SouthCarolina -0.0003 -0.00076 0.0 0.0 0.0 0.0 0.0
FloridaKeys 0.00282 0.00202 0.00204 0.0 0.0 0.0 0.0
MideastGulf 0.00423 0.00354 0.00329 0.00042 0.0 0.0 0.0
NortheastGulf 0.00264 0.00147 0.00146 -7.0e-5 -0.00023 0.0 0.0
SoutheastGulf 0.00312 0.00222 0.00191 -3.0e-5 0.00079 0.00118 0.0