Interface

SingleCellProjections.DataMatrix
SingleCellProjections.DataMatrix
SingleCellProjections.DataMatrix
SingleCellProjections.LowRank
SingleCellProjections.NormalizationModel
SingleCellProjections.PseudoBulkModel
SingleCellProjections.SCTransformModel
SingleCellProjections.SVDModel
Base.copy
LinearAlgebra.svd
SCTransform.sctransform
SingleCellProjections.adjacency_distances
SingleCellProjections.covariate
SingleCellProjections.covariate
SingleCellProjections.designmatrix
SingleCellProjections.filter_matrix
SingleCellProjections.filter_obs
SingleCellProjections.filter_var
SingleCellProjections.force_layout
SingleCellProjections.ftest
SingleCellProjections.ftest!
SingleCellProjections.ftest_table
SingleCellProjections.implicitsvd
SingleCellProjections.load10x
SingleCellProjections.load_counts
SingleCellProjections.load_counts
SingleCellProjections.load_counts
SingleCellProjections.loadh5ad
SingleCellProjections.local_outlier_factor
SingleCellProjections.local_outlier_factor!
SingleCellProjections.local_outlier_factor_projection
SingleCellProjections.local_outlier_factor_projection!
SingleCellProjections.local_outlier_factor_projection_table
SingleCellProjections.local_outlier_factor_table
SingleCellProjections.logtransform
SingleCellProjections.mannwhitney
SingleCellProjections.mannwhitney!
SingleCellProjections.mannwhitney_table
SingleCellProjections.merge_counts
SingleCellProjections.normalize_matrix
SingleCellProjections.normalize_matrix
SingleCellProjections.obs_coordinates
SingleCellProjections.project
SingleCellProjections.project
SingleCellProjections.project
SingleCellProjections.pseudobulk
SingleCellProjections.set_obs_id_col!
SingleCellProjections.set_var_id_col!
SingleCellProjections.splitrange
SingleCellProjections.tf_idf_transform
SingleCellProjections.ttest
SingleCellProjections.ttest!
SingleCellProjections.ttest_table
SingleCellProjections.update_matrix
SingleCellProjections.ustatistic_single
SingleCellProjections.var_coordinates
SingleCellProjections.var_counts_fraction
SingleCellProjections.var_counts_fraction!
SingleCellProjections.var_counts_sum
SingleCellProjections.var_counts_sum!
SingleCellProjections.variable_std
SingleCellProjections.variable_var

SingleCellProjections.DataMatrix — Type

struct DataMatrix{T,Tv,To}

A DataMatrix represents a matrix together with annotations for variables and observations.

Fields:

matrix::T - The matrix.
var::Tv - Variable annotations.
obs::To - Observation annotations.
models::Vector{ProjectionModel} - Models used in the creation of this DataMatrix.

The first column of the var and obs tables should contain unique IDs.

source

SingleCellProjections.DataMatrix — Method

DataMatrix(matrix, var, obs; kwargs...)

Create a DataMatrix with the given matrix, var and obs.

The first column of var/obs are used as IDs.

Kwargs:

duplicate_var - Set to :ignore, :warn or :error to decide what happens if duplicate var IDs are found.
duplicate_obs - Set to :ignore, :warn or :error to decide what happens if duplicate obs IDs are found.

source

SingleCellProjections.DataMatrix — Method

DataMatrix()

Create an empty DataMatrix{Matrix{Float64},DataFrame,DataFrame}.

source

SingleCellProjections.LowRank — Type

LowRank

A matrix decomposition UVᵀ where each row of U represents a variable and each column of Vᵀ represents a sample. Intended for situations where the product is low rank, i.e. size(U,2)==size(Vt,1) is small.

source

SingleCellProjections.NormalizationModel — Method

NormalizationModel(data::DataMatrix, design::DesignMatrix;
                   scale=false, min_std=1e-6, annotate=true,
                   rtol=sqrt(eps()), var=:copy, obs=:copy)

Create a NormalizationModel based on data and a design matrix.

scale - Set to true to normalize variables to unit standard deviation. Can also be set to a vector with a scaling factor for each variable.
min_std - If scale==true, the scale vector is set to 1.0 ./ max.(std, min_std). That is, min_std is used to suppress variables that are very small (and any fluctuations can be assumed to be noise).
annotate - Only used if scale!=false. With annotate=true, the scale vector is added as a var annotation.
rtol - Singular values of the design matrix that are ≤rtol are discarded. Needed for numerical stability.
var - Can be :copy (make a copy of source var) or :keep (share the source var object).
obs - Can be :copy (make a copy of source obs) or :keep (share the source obs object).

source

SingleCellProjections.PseudoBulkModel — Type

PseudoBulkModel <: ProjectionModel

A model used for computing a "pseudo-bulk" representation of a DataMatrix.

See also: designmatrix

source

SingleCellProjections.designmatrix — Method

designmatrix(data::DataMatrix, [covariates...]; center=true, max_categories=100)

Creates a design matrix from data.obs and the given covariates. Covariates can be specied using strings (column name in data.obs), with autodetection of whether the covariate is numerical or categorical, or using the covariate function for more control.

center - If true, an intercept is added to the design matrix. (Should only be set to false in very rare circumstances.)
max_categories - Safety parameter, an error will be thrown if there are too many categories. In this case, it is likely a mistake that the covariate was used as a categorical covariate. Using a very large number of categories is also bad for performance and memory consumption.

Examples

Centering only:

julia> designmatrix(data)

Regression model with intercept (centering) and "fraction_mt" (numerical annotation):

julia> designmatrix(data, "fraction_mt")

As above, but also including "batch" (categorical annotation):

julia> designmatrix(data, "fraction_mt", "batch")

source

SingleCellProjections.filter_matrix — Method

filter_matrix(fvar, fobs, data::DataMatrix)

Return a new DataMatrix, containing only the variables and observations passing the filters.

fvar/fobs can be:

An AbstractVector of indices to keep.
A AbstractVector of booleans (true to keep, false to discard).
: indicating that all variables/observations should be kept.
Anything you can pass on to DataFrames.filter (see DataFrames documentation for details).

Also note that indexing of a DataMatrix supports AbstractVectors of indices/booleans and :, and is otherwise identical to filter_matrix.

Examples

Keep every 10th variable and 3rd observation:

julia> filter_matrix(1:10:size(data,1), 1:3:size(data,2), data)

Or, using indexing syntax:

julia> data[1:10:end, 1:3:end]

For more examples, see filter_var and filter_obs.

source

SingleCellProjections.filter_obs — Method

filter_obs(f, data::DataMatrix)

Return a new DataMatrix, containing only the observations passing the filter.

f can be:

An AbstractVector of indices to keep.
A AbstractVector of booleans (true to keep, false to discard).
: indicating that all observations should be kept.
Anything you can pass on to DataFrames.filter (see DataFrames documentation for details).

Examples

Keep every 10th observation:

julia> filter_obs(1:10:size(data,2), data)

Remove observations where "celltype" equals "other":

julia> filter_obs("celltype"=>!isequal("other"), data)

source

SingleCellProjections.filter_var — Method

filter_var(f, data::DataMatrix; kwargs...)

Return a new DataMatrix, containing only the variables passing the filter.

f can be:

An AbstractVector of indices to keep.
A AbstractVector of booleans (true to keep, false to discard).
: indicating that all variables should be kept.
Anything you can pass on to DataFrames.filter (see DataFrames documentation for details).

Examples

Keep every 10th variable:

julia> filter_var(1:10:size(data,1), data)

Keep only variables of the type "Gene Expression":

julia> filter_var("feature_type"=>isequal("Gene Expression"), data)

source

SingleCellProjections.force_layout — Method

force_layout(data::DataMatrix;
             ndim=3,
             k,
             adj,
             kprojection=10,
             obs=:copy,
             adj_out,
             niter = 100,
             link_distance = 4,
             link_strength = 2,
             charge = 5,
             charge_min_distance = 1,
             theta = 0.9,
             center_strength = 0.05,
             velocity_decay = 0.9,
             initialAlpha = 1.0,
             finalAlpha = 1e-3,
             initialScale = 10,
             seed,
             rng)

Compute the Force Layout (also known as a force directed knn-graph or SPRING plots) for data. Usually, data is a DataMatrix after reduction to 10-100 dimensions by svd.

A Force Layout is computed by running a physics simulation were the observations are connected by springs (such that connected observations are attracted), a general "charge" force repelling all observations from each other and a centering force that keep the observations around the origin. The implementation is based on d3-force: https://github.com/d3/d3-force, also see LICENSE.md.

Exactly one of the kwargs k and adj must be provided. See details below.

General parameters:

k - Number of nearest neighbors to connect each observation to (computes adj below).
adj - An sparse, symmetric, adjacency matrix with booleans. true if two observations are connected by a spring and false otherwise.
kprojection - The number of nearest neighbors used when projecting onto the resulting force layout. (Not used in the computation of the layout, only during projection.)
obs - Can be :copy (make a copy of source obs) or :keep (share the source obs object).
adj_out - Optional Ref. If specified, the (computed) adj matrix will be assigned to adj_out.

Paramters controlling the physics simulation:

niter - Number of iterations to run the simulation.
link_distance - The length of each spring.
link_strength - The strength of the spring force.
charge - The strength of the charge force.
charge_min_distance - Used to avoid numerical instabilities by limiting the charge force for observations that are very close.
theta - Parameter controlling accuracy in the Barnes-Hut approximation for charge forces.
center_strength - Strength of the centering force.
velocity_decay - At each iteration, the current velocity for an observations is multiplied by velocity_decay.
initialAlpha - The alpha value decreases over time and allows larger changes to happen early, while being more stable towards the end.
finalAlpha - See initialAlpha
initialScale - The simulation is initialized by randomly drawing each observation from a multivariate Gaussian, and is scaled by initialScale.
seed - Optional random seed used to init rng. NB: This requires the package StableRNGs to be loaded.
rng - Optional RNG object. Useful for reproducibility.

Examples

julia> force_layout(data; ndim=3, k=100)

source

SingleCellProjections.ftest! — Method

ftest!(data::DataMatrix, h1; h0, kwargs...)

Performs an F-Test with the given h1 (alternative hypothesis) and h0 (null hypothesis). Examples of F-Tests are ANOVA and Quadratic Regression, but any linear model can be used.

ftest! adds a F-statistic and a p-value column to data.var.

See ftest_table for usage examples and more details on computations and parameters.

In addition ftest! supports the kwarg:

prefix - Output column names for F-statistics and p-values will be prefixed with this string. If none is given, it will be constructed from h1 and h0.

See also: ftest_table, ftest, ttest!

source

SingleCellProjections.ftest — Method

ftest(data::DataMatrix, h1; h0, var=:copy, obs=:copy, matrix=:keep, kwargs...)

Performs an F-Test with the given h1 (alternative hypothesis) and h0 (null hypothesis). Examples of F-Tests are ANOVA and Quadratic Regression, but any linear model can be used.

ftest creates a copy of data and adds a F-statistic and a p-value column to data.var.

See ftest_table and ftest! for usage examples and more details on computations and parameters.

See also: ftest!, ftest_table, ttest

source

SingleCellProjections.ftest_table — Method

ftest_table(data::DataMatrix, h1; h0, kwargs...)

Performs an F-Test with the given h1 (alternative hypothesis) and h0 (null hypothesis). Examples of F-Tests are ANOVA and Quadratic Regression, but any linear model can be used. (See "Examples" below for concrete examples.)

F-tests can be performed on any DataMatrix, but it is almost always recommended to do it directly after transforming the data using e.g. sctransform, logtransform or tf_idf_transform.

Normalization

Do not use ftest_table after normalizing the data using normalize_matrix: ftest_table needs to know about the h0 model (regressed out covariates) for correction computations. Failing to do so can result in incorrect results. If you want to correct for the same covariates, pass them as h0 to ftest_table.

h1 can be:

A string specifying a column name of data.obs. Auto-detection determines if the column is categorical (ANOVA) or numerical.
A covariate for more control of how to interpret the values in a column.
A tuple or vector of the above for compound models.

ftest_table returns a Dataframe with columns for variable IDs, F-statistics and p-values.

Supported kwargs are:

h0 - Use a non-trivial h0 (null) model. Specified in the same way as h1 above.
center=true - Add an intercept to the h0 (null) model.
statistic_col="F" - Name of the output column containing the F-statistics. (Set to nothing to remove from output.)
pvalue_col="pValue" - Name of the output column containing the p-values. (Set to nothing to remove from output.)
h1_missing=:skip - One of :skip and :error. If skip, missing values in h1 columns are skipped, otherwise an error is thrown.
h0_missing=:error - One of :skip and :error. If skip, missing values in h0 columns are skipped, otherwise an error is thrown.
allow_normalized_matrix=false - Set to true to accept running on a DataMatrix that has been normalized.

Examples

Perform an ANOVA using the "celltype" annotation.

julia> ftest_table(transformed, "celltype")

Perform an ANOVA using the "celltype" annotation, while correcting for fraction_mt (a linear covariate).

julia> ftest_table(transformed, "celltype"; h0="fraction_mt")

Perform an ANOVA using the "celltype" annotation, while correcting for fraction_mt (a linear covariate) and "phase" (a categorical covariate).

julia> ftest_table(transformed, "celltype"; h0=("fraction_mt","phase"))

Perform Quadractic Regression using the covariate x, by first creating an annotation for x squared, and then using a compound model.

julia> data.obs.x2 = data.obs.x.^2;

julia> ftest_table(transformed, ("x","x2"))

source

SingleCellProjections.implicitsvd — Method

implicitsvd(A; nsv=3, subspacedims=4nsv, niter=2, stabilize_sign=true, seed, rng)

Compute the SVD of A using Random Subspace SVD. [Halko et al. "Finding structure with randomness: Probabilistic algorithms for constructing approximate matrix decompositions"]

nsv - Number of singular values/vectors to compute
subspacedims - Number of dimensions used for the subspace approximating the action of A.
niter - Number of iterations. In each iteration, one multiplication of A with a matrix and one multiplication of A' with a matrix will be performed.
stabilize_sign - If true, handles the problem that the SVD is only unique up to the sign of each component (for real matrices), by ensuring that the l1 norm of the positive entires for each column in U is larger than the l1 norm of the negative entries.
seed - Use a random seed to init the rng. NB: This requires the package StableRNGs to be loaded.
rng - Specify a custom RNG.

source

SingleCellProjections.load10x — Method

load10x(filename; lazy=false, var_id=nothing, var_id_delim='_')

Load a CellRanger ".h5" or ".mtx[.gz]" file as a DataMatrix.

lazy - If true, the count matrix itself will not be loaded, only features and barcodes. This is used internally in load_counts to merge samples more efficiently. Use load_counts to later load the count data.
var_id - If a pair var_id_col=>cols, the contents of columns cols will be merged to create new IDs. Useful to ensure that IDs are unique.
var_id_delim - Delimiter used to when merging variable columns to create the variable id column.

Examples

Load counts from a CellRanger ".h5" file. (Recommended.)

julia> counts = load10x("filtered_feature_bc_matrix.h5")

Load counts from a CellRanger ".mtx" file. Tries to find barcode and feature annotation files in the same folder.

julia> counts = load10x("matrix.mtx.gz")

Lazy loading followed by loading.

julia> counts = load10x("filtered_feature_bc_matrix.h5");
julia> counts = load_counts(counts)

See also: load_counts

source

SingleCellProjections.normalize_matrix — Method

normalize_matrix(data::DataMatrix, design::DesignMatrix; scale=false, kwargs...)

Normalize data using the specified design matrix.

source

SingleCellProjections.normalize_matrix — Method

normalize_matrix(data::DataMatrix, [covariates...]; center=true, scale=false, kwargs...)

Normalize data. By default, the matrix is centered. Any covariates specified (using column names of data.obs) will be regressed out.

center - Set to true to center the data matrix.
scale - Set to true to scale the variables in the data matrix to unit standard deviation.

For other kwargs and more detailed descriptions, see NormalizationModel and designmatrix.

Examples

Centering only:

julia> normalize_matrix(data)

Regression model with intercept (centering) and "fraction_mt" (numerical annotation):

julia> normalize_matrix(data, "fraction_mt")

As above, but also including "batch" (categorical annotation):

julia> normalize_matrix(data, "fraction_mt", "batch")

source

SingleCellProjections.obs_coordinates — Function

obs_coordinates(data::DataMatrix)

Returns a matrix with coordinates for the observations. Not available for all types of DataMatrices. Mostly useful for data matrices after dimension reduction such as svd or force_layout has been applied.

In the case of SVD (PCA), obs_coordinates returns the principal components, scaled by the singular values. This is a a good starting point for downstream analysis, since it is the optimal linear approximation of the original data for the given number of dimensions.

source

SingleCellProjections.project — Method

project(data::DataMatrix, models, args...; verbose=true, kwargs...)

Convenience function for projection onto multiple models. Essentially calls foldl and prints some @info messages (if verbose=true). In most cases, it is better to call project(data, base::DataMatrix) instead of using this method directly.

source

SingleCellProjections.project — Method

project(data::DataMatrix, base::DataMatrix, args...; from=nothing, kwargs...)

Project data onto base, by applying ProjectionModels from base one by one.

Since data already might have some models applied, project will try to figure out which models from base to use. See "Examples" below for concrete examples. Here's a more technical overview:

Consider a base data matrix with four models:

base: A -> B -> C -> D

Given some new data (typically counts), we can project that onto base, given the result proj by applying all four models:

data:
proj: A -> B -> C -> D

If data already has some models applied (e.g. we already projected onto A and B above), project will look for the last model in data (in this case B) in the list of models in base, and only apply models after that (in this case C and D).

data: A -> B
proj: A -> B -> C -> D

It is also possible to use the from kwarg to specify exactly which models to apply. (The models in from must be a prefix of the models in base, or in other words, base was created by applying additional operations to from.)

data: X
base: A -> B -> C -> D
from: A -> B
proj: X -> C -> D

Note that it is necessary to use the from kwarg if the last model in data does not occurr in base, because project cannot figure out on its own which models it makes sense to apply.

Examples

First, we construct a "base" by loading counts, SCTransforming, normalizing, computing the svd and finally computing a force layout:

julia> fp = ["GSE164378_RNA_ADT_3P_P1.h5", "GSE164378_RNA_ADT_3P_P2.h5"];
julia> counts = load_counts(fp; sample_names=["P1","P2"]);
julia> transformed = sctransform(counts);
julia> normalized = normalize_matrix(transformed);
julia> reduced = svd(normalized; nsv=10);
julia> fl = force_layout(reduced; ndim=3, k=100)
  DataMatrix (3 variables and 35340 observations)
  Matrix{Float64}
  Variables: id
  Observations: id, sampleName, barcode
  Models: NearestNeighborModel(base="force_layout", k=10), SVD, Normalization, SCTransform

Note how the last line lists all ProjectionModels used in the creation of fl.

Next, let's load some more samples for projection:

julia> fp2 = ["GSE164378_RNA_ADT_3P_P5.h5", "GSE164378_RNA_ADT_3P_P6.h5"];
julia> counts2 = load_counts(fp2; sample_names=["P5","P6"]);

It is easy to project the newly loaded counts2 onto the "base" force layout fl:

julia> project(counts2, fl)
DataMatrix (3 variables and 42553 observations)
  Matrix{Float64}
  Variables: id
  Observations: id, sampleName, barcode
  Models: NearestNeighborModel(base="force_layout", k=10), SVD, Normalization, SCTransform

We can also project in two or more steps, to get access to intermediate results:

julia> reduced2 = project(counts2, reduced)
DataMatrix (20239 variables and 42553 observations)
  SVD (10 dimensions)
  Variables: id, feature_type, name, genome, read, pattern, sequence, logGeneMean, outlier, beta0, ...
  Observations: id, sampleName, barcode
  Models: SVDModel(nsv=10), Normalization, SCTransform

julia> project(reduced2, fl)
DataMatrix (3 variables and 42553 observations)
  Matrix{Float64}
  Variables: id
  Observations: id, sampleName, barcode
  Models: NearestNeighborModel(base="force_layout", k=10), SVD, Normalization, SCTransform

If the DataMatrix we want to project is modified, we need to use the from kwarg to tell project which models to use:

julia> filtered = counts2[:,1:10:end]
DataMatrix (33766 variables and 4256 observations)
  SparseArrays.SparseMatrixCSC{Int64, Int32}
  Variables: id, feature_type, name, genome, read, pattern, sequence
  Observations: id, sampleName, barcode
  Models: FilterModel(:, 1:10:42551)

julia> reduced2b = project(filtered2, reduced; from=counts)
DataMatrix (20239 variables and 4256 observations)
  SVD (10 dimensions)
  Variables: id, feature_type, name, genome, read, pattern, sequence, logGeneMean, outlier, beta0, ...
  Observations: id, sampleName, barcode
  Models: SVDModel(nsv=10), Normalization, SCTransform, Filter

After that, it is possible to continue without specifying from:

julia> project(reduced2b, fl)
DataMatrix (3 variables and 4256 observations)
  Matrix{Float64}
  Variables: id
  Observations: id, sampleName, barcode
  Models: NearestNeighborModel(base="force_layout", k=10), SVD, Normalization, SCTransform, Filter

source

SingleCellProjections.project — Method

project(data::DataMatrix, model::ProjectionModel, args...; verbose=true, kwargs...)

Core projection function. Project data based on the single ProjectionModel model. In most cases, it is better to call project(data, base::DataMatrix) instead of using this method directly.

source

SingleCellProjections.pseudobulk — Method

pseudobulk(data::DataMatrix, obs_col, [additional_columns...]; var=:copy)

Create a new DataMatrix by averging over groups, as specified by the categorical annotation obs_col (and optionally additional columns).

var - Can be :copy (make a copy of source var) or :keep (share the source var object).

Examples

Create a pseudobulk representation of each sample:

julia> pseudobulk(transformed, "sampleName")

Create a pseudobulk representation for each celltype in each sample:

julia> pseudobulk(transformed, "sampleName", "celltype")

source

SingleCellProjections.set_obs_id_col! — Method

set_obs_id_col!(data::DataMatrix, obs_id_col::String; duplicate_obs=:error)

Set which column to use as observation IDs. It will be moved to the first column of data.obs. The rows of this column in data.obs must be unique.

duplicate_obs - Set to :ignore, :warn or :error to decide what happens if duplicate IDs are found.

source

SingleCellProjections.set_var_id_col! — Method

set_var_id_col!(data::DataMatrix, var_id_col::String; duplicate_var=:error)

Set which column to use as variable IDs. It will be moved to the first column of data.var. The rows of this column in data.var must be unique.

duplicate_var - Set to :ignore, :warn or :error to decide what happens if duplicate IDs are found.

source

SingleCellProjections.splitrange — Method

splitrange(r::UnitRange, nparts::Integer)

Splits a range in nparts number of parts of equal length.

source

SingleCellProjections.tf_idf_transform — Method

tf_idf_transform([T=Float64], counts::DataMatrix;
                 var_filter = hasproperty(counts.var, "feature_type") ? "feature_type" => isequal("Gene Expression") : nothing,
                 var_filter_cols = hasproperty(counts.var, "feature_type") ? "feature_type" : nothing,
                 scale_factor = 10_000,
                 idf = vec(size(counts,2) ./ max.(1,sum(counts.matrix; dims=2))),
                 annotate = true,
                 var = :copy,
                 obs = :copy)

Compute the TF-IDF (term frequency-inverse document frequency) transform of counts, using the formula log( 1 + scale_factor * tf * idf ) where tf is the term frequency counts.matrix ./ max.(1, sum(counts.matrix; dims=1)).

var_filter - Control which variables (features) to use for parameter estimation. Defaults to "feature_type" => isequal("Gene Expression"), if a feature_type column is present in counts.var. Can be set to nothing to disable filtering. See DataFrames.filter for how to specify filters.
var_filter_cols - Additional columns used to ensure features are unique. Defaults to "feature_type" if present in counts.var. Use a Tuple/Vector for specifying multiple columns. Can be set to nothing to not include any additional columns.
annotate - If true, idf will be added as a var annotation.
var - Can be :copy (make a copy of source var) or :keep (share the source var object).
obs - Can be :copy (make a copy of source obs) or :keep (share the source obs object).

source

SingleCellProjections.ttest! — Method

ttest!(data::DataMatrix, h1, [group_a], [group_b]; h0, kwargs...)

Performs a t-Test with the given h1 (alternative hypothesis) and h0 (null hypothesis). Examples of t-Tests are Two-Group tests and Linear Regression.

ttest! adds a t-statistic, a p-value and a difference column to data.var.

See ttest_table for usage examples and more details on computations and parameters.

In addition ttest! supports the kwarg:

prefix - Output column names for t-statistics, p-values and differences will be prefixed with this string. If none is given, it will be constructed from h1, group_a, group_b and h0.

source

SingleCellProjections.ttest — Method

ttest(data::DataMatrix, h1, [group_a], [group_b]; h0, var=:copy, obs=:copy, matrix=:keep, kwargs...)

Performs a t-Test with the given h1 (alternative hypothesis) and h0 (null hypothesis). Examples of t-Tests are Two-Group tests and Linear Regression.

ttest creates a copy of data and adds a t-statistic, a p-value and a difference column to data.var.

See ttest_table and ttest! for usage examples and more details on computations and parameters.

source

SingleCellProjections.ttest_table — Method

ttest_table(data::DataMatrix, h1, [group_a], [group_b]; h0, kwargs...)

Performs a t-Test with the given h1 (alternative hypothesis) and h0 (null hypothesis). Examples of t-Tests are Two-Group tests and Linear Regression.

T-tests can be performed on any DataMatrix, but it is almost always recommended to do it directly after transforming the data using e.g. sctransform, logtransform or tf_idf_transform.

Normalization

Do not use ttest_table after normalizing the data using normalize_matrix: ttest_table needs to know about the h0 model (regressed out covariates) for correction computations. Failing to do so can result in incorrect results. If you want to correct for the same covariates, pass them as h0 to ttest_table.

h1 can be:

A string specifying a column name of data.obs. Auto-detection determines if the column is categorical (Two-Group) or numerical (linear regression).
- If group_a and group_b are specified, a Two-Group test between group_a and group_b is performed.
- If group_a is specified, but not group_b, a Two-Group test between group_a and all other observations is performed.
A covariate for more control of how to interpret the values in the column.

ttest_table returns a Dataframe with columns for variable IDs, t-statistics, p-values and differences. For Two-group tests, difference is the difference in mean between the two groups. For linear regression, the difference corresponds to the rate of change.

Supported kwargs are:

h0 - Use a non-trivial h0 (null) model. Specified in the same way as h1 above.
center=true - Add an intercept to the h0 (null) model.
statistic_col="t" - Name of the output column containing the t-statistics. (Set to nothing to remove from output.)
pvalue_col="pValue" - Name of the output column containing the p-values. (Set to nothing to remove from output.)
difference_col="difference" - Name of the output column containing the differences. (Set to nothing to remove from output.)
h1_missing=:skip - One of :skip and :error. If skip, missing values in h1 columns are skipped, otherwise an error is thrown.
h0_missing=:error - One of :skip and :error. If skip, missing values in h0 columns are skipped, otherwise an error is thrown.
allow_normalized_matrix=false - Set to true to accept running on a DataMatrix that has been normalized.

Examples

Perform a Two-Group t-test between celltypes "Mono" and "DC".

julia> ttest_table(transformed, "celltype", "Mono", "DC")

Perform a Two-Group t-test between celltype "Mono" and all other cells.

julia> ttest_table(transformed, "celltype", "Mono")

Perform a Two-Group t-test between celltypes "Mono" and "DC", while correcting for "fraction_mt" (a linear covariate).

julia> ttest_table(transformed, "celltype", "Mono", "DC")

Perform Linear Regression using the covariate "fraction_mt".

julia> ttest_table(transformed, "fraction_mt")

source

SingleCellProjections.update_matrix — Function

update_matrix(data::DataMatrix, matrix, model=nothing;
              var::Union{Symbol,String,DataFrame} = "",
              obs::Union{Symbol,String,DataFrame} = "")

Create a new DataMatrix by replacing parts of data with new values. Mostly useful when implementing new ProjectionModels.

matrix - the new matrix.
model - will be appended to the list of models from data. If set to nothing, the resulting list of models will be empty.

Kwargs:

var - One of:
- :copy - Copy from data.
- :keep - Share var with data.
- ::DataFrame - Replace with a new table with variable annotations.
- prefix::String - Prefix, the new variables will be named prefix1, prefix2, etc.
obs See var.

source

SingleCellProjections.ustatistic_single — Method

ustatistic_single(X, j, groups, n1, n2)

NB: Assumes all sparse non-zeros are positive.

X is a sparse matrix where each column is a variable. j is the current variable. groups is a vector with values: 1 for each sample in group 1, 2 for each sample in group 2 and 0 for samples in neither group. n1 number of elements in group 1 (precomputed from groups) n2 number of elements in group 2 (precomputed from groups)

source

SingleCellProjections.var_coordinates — Function

var_coordinates(data::DataMatrix)

Returns a matrix with coordinates for the variables. Only available for DataMatrices that have a dual representation (e.g. SVD/PCA).

In the case of SVD (PCA), var_coordinates returns the principal components as unit vectors.

source

SingleCellProjections.var_counts_fraction! — Method

var_counts_fraction!(counts::DataMatrix, sub_filter, tot_filter, col; check=true, var=:keep, obs=:keep)

For each observation, compute the fraction of counts that match a specific variable pattern.

sub_filter decides which variables are counted.
tot_filter decides which variables to include in the total.

kwargs:

var - Use this to set var in the ProjectionModel.
obs - Use this to set obs in the ProjectionModel. Note that counts.obs is changed in place, regardless of the value of obs.

If check=true, an error will be thrown if no variables match the patterns.

For more information on filtering syntax, see examples below and the documentation on DataFrames.filter.

Examples

Compute the fraction of reads in MT- genes, considering only "Gene Expression" features (and not e.g. "Antibody Capture").

var_counts_fraction!(counts, "name"=>startswith("MT-"), "feature_type"=>isequal("Gene Expression"), "fraction_mt")

Compute the fraction of reads in MT- genes, when there is no feature_type annotation (i.e. all variables are genes).

var_counts_fraction!(counts, "name"=>startswith("MT-"), Returns(true), "fraction_mt")