Interface
SingleCellProjections.DataMatrix
SingleCellProjections.DataMatrix
SingleCellProjections.DataMatrix
SingleCellProjections.LowRank
SingleCellProjections.NormalizationModel
SingleCellProjections.PseudoBulkModel
SingleCellProjections.SCTransformModel
SingleCellProjections.SVDModel
Base.copy
LinearAlgebra.svd
SCTransform.sctransform
SingleCellProjections.adjacency_distances
SingleCellProjections.covariate
SingleCellProjections.covariate
SingleCellProjections.designmatrix
SingleCellProjections.filter_matrix
SingleCellProjections.filter_obs
SingleCellProjections.filter_var
SingleCellProjections.force_layout
SingleCellProjections.ftest
SingleCellProjections.ftest!
SingleCellProjections.ftest_table
SingleCellProjections.implicitsvd
SingleCellProjections.load10x
SingleCellProjections.load_counts
SingleCellProjections.load_counts
SingleCellProjections.load_counts
SingleCellProjections.loadh5ad
SingleCellProjections.local_outlier_factor
SingleCellProjections.local_outlier_factor!
SingleCellProjections.local_outlier_factor_projection
SingleCellProjections.local_outlier_factor_projection!
SingleCellProjections.local_outlier_factor_projection_table
SingleCellProjections.local_outlier_factor_table
SingleCellProjections.logtransform
SingleCellProjections.mannwhitney
SingleCellProjections.mannwhitney!
SingleCellProjections.mannwhitney_table
SingleCellProjections.merge_counts
SingleCellProjections.normalize_matrix
SingleCellProjections.normalize_matrix
SingleCellProjections.obs_coordinates
SingleCellProjections.project
SingleCellProjections.project
SingleCellProjections.project
SingleCellProjections.pseudobulk
SingleCellProjections.set_obs_id_col!
SingleCellProjections.set_var_id_col!
SingleCellProjections.splitrange
SingleCellProjections.tf_idf_transform
SingleCellProjections.ttest
SingleCellProjections.ttest!
SingleCellProjections.ttest_table
SingleCellProjections.update_matrix
SingleCellProjections.ustatistic_single
SingleCellProjections.var_coordinates
SingleCellProjections.var_counts_fraction
SingleCellProjections.var_counts_fraction!
SingleCellProjections.var_counts_sum
SingleCellProjections.var_counts_sum!
SingleCellProjections.variable_std
SingleCellProjections.variable_var
SingleCellProjections.DataMatrix
— Typestruct DataMatrix{T,Tv,To}
A DataMatrix
represents a matrix together with annotations for variables and observations.
Fields:
matrix::T
- The matrix.var::Tv
- Variable annotations.obs::To
- Observation annotations.models::Vector{ProjectionModel}
- Models used in the creation of thisDataMatrix
.
The first column of the var
and obs
tables should contain unique IDs.
SingleCellProjections.DataMatrix
— MethodDataMatrix(matrix, var, obs; kwargs...)
Create a DataMatrix
with the given matrix
, var
and obs
.
The first column of var
/obs
are used as IDs.
Kwargs:
duplicate_var
- Set to:ignore
,:warn
or:error
to decide what happens if duplicate var IDs are found.duplicate_obs
- Set to:ignore
,:warn
or:error
to decide what happens if duplicate obs IDs are found.
SingleCellProjections.DataMatrix
— MethodDataMatrix()
Create an empty DataMatrix{Matrix{Float64},DataFrame,DataFrame}.
SingleCellProjections.LowRank
— TypeLowRank
A matrix decomposition UVᵀ
where each row of U
represents a variable and each column of Vᵀ
represents a sample. Intended for situations where the product is low rank, i.e. size(U,2)==size(Vt,1)
is small.
SingleCellProjections.NormalizationModel
— MethodNormalizationModel(data::DataMatrix, design::DesignMatrix;
scale=false, min_std=1e-6, annotate=true,
rtol=sqrt(eps()), var=:copy, obs=:copy)
Create a NormalizationModel based on data
and a design
matrix.
scale
- Set to true to normalize variables to unit standard deviation. Can also be set to a vector with a scaling factor for each variable.min_std
- Ifscale==true
, thescale
vector is set to1.0 ./ max.(std, min_std)
. That is,min_std
is used to suppress variables that are very small (and any fluctuations can be assumed to be noise).annotate
- Only used ifscale!=false
. Withannotate=true
, thescale
vector is added as a var annotation.rtol
- Singular values of the design matrix that are≤rtol
are discarded. Needed for numerical stability.var
- Can be:copy
(make a copy of sourcevar
) or:keep
(share the sourcevar
object).obs
- Can be:copy
(make a copy of sourceobs
) or:keep
(share the sourceobs
object).
See also: normalize_matrix
, designmatrix
SingleCellProjections.PseudoBulkModel
— TypePseudoBulkModel <: ProjectionModel
A model used for computing a "pseudo-bulk" representation of a DataMatrix.
See also: pseudobulk
SingleCellProjections.SCTransformModel
— MethodSCTransformModel([T=Float64], counts::DataMatrix;
var_filter = hasproperty(counts.var, :feature_type) ? :feature_type => isequal("Gene Expression") : nothing,
rtol=1e-3, atol=0.0, annotate=true,
post_var_filter=:, post_obs_filter=:,
obs=:copy,
kwargs...)
Computes the SCTransform
parameter estimates for counts
and creates a SCTransformModel that can be applied to the same or another data set. Defaults to only using "Gene Expression" features.
Optionally, T
can be specified to control the eltype
of the sparse transformed matrix. T=Float32
can be used to lower the memory usage, with little impact on the results, since downstream analysis is still done with Float64.
var_filter
- Control which variables (features) to use for parameter estimation. Defaults to"feature_type" => isequal("Gene Expression")
, if afeature_type
column is present incounts.var
. Can be set tonothing
to disable filtering. SeeDataFrames.filter
for how to specify filters.var_filter_cols
- Additional columns used to ensure features are unique. Defaults to "feature_type" if present incounts.var
. Use a Tuple/Vector for specifying multiple columns. Can be set tonothing
to not include any additional columns.rtol
- Relative tolerance when constructing low rank approximation.atol
- Absolute tolerance when constructing low rank approximation.annotate
- Set to true to include SCTransform parameter estimates as feature annotations.post_var_filter
- Equivalent to applying variable (feature) filtering after sctransform, but computationally more efficient.post_obs_filter
- Equivalent to applying observation (cell) filtering after sctransform, but computationally more efficient.obs
- Can be:copy
(make a copy of sourceobs
) or:keep
(share the sourceobs
object).kwargs...
- Additionalkwargs
are passed on toSCTransform.scparams
.
Examples
Setup SCTransformModel
(Gene Expression features):
julia> SCTransformModel(counts)
Setup SCTransformModel
(Antibody Capture features):
julia> SCTransformModel(counts; var_filter = :feature_type => isequal("Antibody Capture"))
See also: sctransform
, SCTransform.scparams
, DataFrames.filter
SingleCellProjections.SVDModel
— TypeSVDModel <: ProjectionModel
A model used for projecting onto an SVD
object. Normally created using svd(::DataMatrix)
.
See also: svd
Base.copy
— Methodcopy(data::DataMatrix; var=:copy, obs=:copy, matrix=:keep)
Copy DataMatrix data
. By default, var
and obs
annotations are copied, but the matrix
is shared. Set kwargs var
, obs
and matrix
to :keep
/:copy
for fine grained control.
LinearAlgebra.svd
— Methodsvd(data::DataMatrix; nsv=3, var=:copy, obs=:copy, kwargs...)
Compute the Singular Value Decomposition (SVD) of data
using the Random Subspace SVD algorithm from [Halko et al. "Finding structure with randomness: Probabilistic algorithms for constructing approximate matrix decompositions"]. SVD is often used to perform Principal Component Analysis (PCA), which assumes that the data is centered.
nsv
- The number of singular values.var
- Can be:copy
(make a copy of sourcevar
) or:keep
(share the sourcevar
object).obs
- Can be:copy
(make a copy of sourceobs
) or:keep
(share the sourceobs
object).
Additional kwargs related to numerical precision are passed to SingleCellProjections.implicitsvd
.
See also: SingleCellProjections.implicitsvd
SCTransform.sctransform
— Methodsctransform([T=Float64], counts::DataMatrix; verbose=true, kwargs...)
Compute the SCTransform of the DataMatrix counts
. The result is stored as a Matrix Expression with the sum of a sparse and a low-rank term. I.e. no large dense matrix is created.
Optionally, T
can be specified to control the eltype
of the sparse transformed matrix. T=Float32
can be used to lower the memory usage, with little impact on the results, since downstream analysis is still done with Float64.
See SCTransformModel
for description of kwargs...
.
Examples
Compute SCTransform (Gene Expression features):
julia> sctransform(counts)
Compute SCTransform (Antibody Capture features):
julia> sctransform(counts; var_filter = :feature_type => isequal("Antibody Capture"))
Compute SCTransform (Gene Expression features), using eltype Float32 to lower memory usage:
julia> sctransform(Float32, counts)
See also: SCTransformModel
, SCTransform.scparams
SingleCellProjections.adjacency_distances
— Functionadjacency_distances(adj, X, Y=X)
For each structural non-zero in adj
, compute the Euclidean distance between the point in the DataMatrix Y
and the point in the DataMatrix X
.
Can be useful when adj
is created using e.g. a lower dimensional representation and we want to know the distances in the original, high dimensional space.
At the moment all points in Y
are required to have the same number of neighbors in X
, for computation reasons.
SingleCellProjections.covariate
— Functioncovariate(src, group_a, [group_b])
Create a two-group covariate
referring to src
, comparing group_a
to group_b
.
src
is one of:
String
- referring to a column in the DataMatrixobs
.DataFrame
- with exactly two columns, the first should contain IDs matching IDs inobs
, and the second should be the covariate.Annotations
(experimental) - with ID matching the DataMatrixobs
and a second column for the covariate.
If src
is a String
it will refer to a column in the DataMatrix obs
. src
can also be an Annotations
object, with ID matching the DataMatrix obs
. group_a
and group_b
must be values occuring in the column src
.
If group_b
is not given, group_a
will be compared to all other observations.
See also: designmatrix
SingleCellProjections.covariate
— Functioncovariate(src, type=:auto)
Create a covariate
referring to src
.
src
is one of:
String
- referring to a column in the DataMatrixobs
.DataFrame
- with exactly two columns, the first should contain IDs matching IDs inobs
, and the second should be the covariate.Annotations
(experimental) - with ID matching the DataMatrixobs
and a second column for the covariate.
type
must be one of :auto
, :numerical
, :categorical
, :twogroup
and :intercept
. :auto
means auto-detection by checking if the values in the column are numerical or categorical. type==:intercept
adds an intercept to the model (in which case the src
parameter is ignored).
See also: designmatrix
SingleCellProjections.designmatrix
— Methoddesignmatrix(data::DataMatrix, [covariates...]; center=true, max_categories=100)
Creates a design matrix from data.obs
and the given covariates
. Covariates can be specied using strings (column name in data.obs), with autodetection of whether the covariate is numerical or categorical, or using the covariate
function for more control.
center
- Iftrue
, an intercept is added to the design matrix. (Should only be set tofalse
in very rare circumstances.)max_categories
- Safety parameter, an error will be thrown if there are too many categories. In this case, it is likely a mistake that the covariate was used as a categorical covariate. Using a very large number of categories is also bad for performance and memory consumption.
Examples
Centering only:
julia> designmatrix(data)
Regression model with intercept (centering) and "fraction_mt" (numerical annotation):
julia> designmatrix(data, "fraction_mt")
As above, but also including "batch" (categorical annotation):
julia> designmatrix(data, "fraction_mt", "batch")
See also: normalize_matrix
, NormalizationModel
, covariate
SingleCellProjections.filter_matrix
— Methodfilter_matrix(fvar, fobs, data::DataMatrix)
Return a new DataMatrix, containing only the variables and observations passing the filters.
fvar
/fobs
can be:
- An
AbstractVector
of indices to keep. - A
AbstractVector
of booleans (true to keep, false to discard). :
indicating that all variables/observations should be kept.- Anything you can pass on to
DataFrames.filter
(see DataFrames documentation for details).
Also note that indexing of a DataMatrix supports AbstractVector
s of indices/booleans and :
, and is otherwise identical to filter_matrix
.
Examples
Keep every 10th variable and 3rd observation:
julia> filter_matrix(1:10:size(data,1), 1:3:size(data,2), data)
Or, using indexing syntax:
julia> data[1:10:end, 1:3:end]
For more examples, see filter_var
and filter_obs
.
See also: filter_var
, filter_obs
, DataFrames.filter
SingleCellProjections.filter_obs
— Methodfilter_obs(f, data::DataMatrix)
Return a new DataMatrix, containing only the observations passing the filter.
f
can be:
- An
AbstractVector
of indices to keep. - A
AbstractVector
of booleans (true to keep, false to discard). :
indicating that all observations should be kept.- Anything you can pass on to
DataFrames.filter
(see DataFrames documentation for details).
Examples
Keep every 10th observation:
julia> filter_obs(1:10:size(data,2), data)
Remove observations where "celltype" equals "other":
julia> filter_obs("celltype"=>!isequal("other"), data)
See also: filter_matrix
, filter_var
, DataFrames.filter
SingleCellProjections.filter_var
— Methodfilter_var(f, data::DataMatrix; kwargs...)
Return a new DataMatrix, containing only the variables passing the filter.
f
can be:
- An
AbstractVector
of indices to keep. - A
AbstractVector
of booleans (true to keep, false to discard). :
indicating that all variables should be kept.- Anything you can pass on to
DataFrames.filter
(see DataFrames documentation for details).
Examples
Keep every 10th variable:
julia> filter_var(1:10:size(data,1), data)
Keep only variables of the type "Gene Expression":
julia> filter_var("feature_type"=>isequal("Gene Expression"), data)
See also: filter_matrix
, filter_obs
, DataFrames.filter
SingleCellProjections.force_layout
— Methodforce_layout(data::DataMatrix;
ndim=3,
k,
adj,
kprojection=10,
obs=:copy,
adj_out,
niter = 100,
link_distance = 4,
link_strength = 2,
charge = 5,
charge_min_distance = 1,
theta = 0.9,
center_strength = 0.05,
velocity_decay = 0.9,
initialAlpha = 1.0,
finalAlpha = 1e-3,
initialScale = 10,
seed,
rng)
Compute the Force Layout (also known as a force directed knn-graph or SPRING plots) for data
. Usually, data
is a DataMatrix after reduction to 10-100
dimensions by svd
.
A Force Layout is computed by running a physics simulation were the observations are connected by springs (such that connected observations are attracted), a general "charge" force repelling all observations from each other and a centering force that keep the observations around the origin. The implementation is based on d3-force: https://github.com/d3/d3-force, also see LICENSE.md.
Exactly one of the kwargs k
and adj
must be provided. See details below.
General parameters:
k
- Number of nearest neighbors to connect each observation to (computesadj
below).adj
- An sparse, symmetric, adjacency matrix with booleans.true
if two observations are connected by a spring andfalse
otherwise.kprojection
- The number of nearest neighbors used when projecting onto the resulting force layout. (Not used in the computation of the layout, only during projection.)obs
- Can be:copy
(make a copy of sourceobs
) or:keep
(share the sourceobs
object).adj_out
- OptionalRef
. If specified, the (computed)adj
matrix will be assigned toadj_out
.
Paramters controlling the physics simulation:
niter
- Number of iterations to run the simulation.link_distance
- The length of each spring.link_strength
- The strength of the spring force.charge
- The strength of the charge force.charge_min_distance
- Used to avoid numerical instabilities by limiting the charge force for observations that are very close.theta
- Parameter controlling accuracy in the Barnes-Hut approximation for charge forces.center_strength
- Strength of the centering force.velocity_decay
- At each iteration, the current velocity for an observations is multiplied byvelocity_decay
.initialAlpha
- The alpha value decreases over time and allows larger changes to happen early, while being more stable towards the end.finalAlpha
- SeeinitialAlpha
initialScale
- The simulation is initialized by randomly drawing each observation from a multivariate Gaussian, and is scaled byinitialScale
.seed
- Optional random seed used to initrng
. NB: This requires the packageStableRNGs
to be loaded.rng
- Optional RNG object. Useful for reproducibility.
Examples
julia> force_layout(data; ndim=3, k=100)
SingleCellProjections.ftest!
— Methodftest!(data::DataMatrix, h1; h0, kwargs...)
Performs an F-Test with the given h1
(alternative hypothesis) and h0
(null hypothesis). Examples of F-Tests are ANOVA and Quadratic Regression, but any linear model can be used.
ftest!
adds a F-statistic and a p-value column to data.var
.
See ftest_table
for usage examples and more details on computations and parameters.
In addition ftest!
supports the kwarg
:
prefix
- Output column names for F-statistics and p-values will be prefixed with this string. If none is given, it will be constructed fromh1
andh0
.
See also: ftest_table
, ftest
, ttest!
SingleCellProjections.ftest
— Methodftest(data::DataMatrix, h1; h0, var=:copy, obs=:copy, matrix=:keep, kwargs...)
Performs an F-Test with the given h1
(alternative hypothesis) and h0
(null hypothesis). Examples of F-Tests are ANOVA and Quadratic Regression, but any linear model can be used.
ftest
creates a copy of data
and adds a F-statistic and a p-value column to data.var
.
See ftest_table
and ftest!
for usage examples and more details on computations and parameters.
See also: ftest!
, ftest_table
, ttest
SingleCellProjections.ftest_table
— Methodftest_table(data::DataMatrix, h1; h0, kwargs...)
Performs an F-Test with the given h1
(alternative hypothesis) and h0
(null hypothesis). Examples of F-Tests are ANOVA and Quadratic Regression, but any linear model can be used. (See "Examples" below for concrete examples.)
F-tests can be performed on any DataMatrix
, but it is almost always recommended to do it directly after transforming the data using e.g. sctransform
, logtransform
or tf_idf_transform
.
Do not use ftest_table
after normalizing the data using normalize_matrix
: ftest_table
needs to know about the h0
model (regressed out covariates) for correction computations. Failing to do so can result in incorrect results. If you want to correct for the same covariates, pass them as h0
to ftest_table
.
h1
can be:
- A string specifying a column name of
data.obs
. Auto-detection determines if the column is categorical (ANOVA) or numerical. - A
covariate
for more control of how to interpret the values in a column. - A tuple or vector of the above for compound models.
ftest_table
returns a Dataframe with columns for variable IDs, F-statistics and p-values.
Supported kwargs
are:
h0
- Use a non-trivialh0
(null) model. Specified in the same way ash1
above.center=true
- Add an intercept to theh0
(null) model.statistic_col="F"
- Name of the output column containing the F-statistics. (Set to nothing to remove from output.)pvalue_col="pValue"
- Name of the output column containing the p-values. (Set to nothing to remove from output.)h1_missing=:skip
- One of:skip
and:error
. Ifskip
, missing values inh1
columns are skipped, otherwise an error is thrown.h0_missing=:error
- One of:skip
and:error
. Ifskip
, missing values inh0
columns are skipped, otherwise an error is thrown.allow_normalized_matrix=false
- Set to true to accept running on aDataMatrix
that has been normalized.
Examples
Perform an ANOVA using the "celltype" annotation.
julia> ftest_table(transformed, "celltype")
Perform an ANOVA using the "celltype" annotation, while correcting for fraction_mt
(a linear covariate).
julia> ftest_table(transformed, "celltype"; h0="fraction_mt")
Perform an ANOVA using the "celltype" annotation, while correcting for fraction_mt
(a linear covariate) and "phase" (a categorical covariate).
julia> ftest_table(transformed, "celltype"; h0=("fraction_mt","phase"))
Perform Quadractic Regression using the covariate x
, by first creating an annotation for x
squared, and then using a compound model.
julia> data.obs.x2 = data.obs.x.^2;
julia> ftest_table(transformed, ("x","x2"))
See also: ftest!
, ftest
, ttest_table
, covariate
SingleCellProjections.implicitsvd
— Methodimplicitsvd(A; nsv=3, subspacedims=4nsv, niter=2, stabilize_sign=true, seed, rng)
Compute the SVD of A
using Random Subspace SVD. [Halko et al. "Finding structure with randomness: Probabilistic algorithms for constructing approximate matrix decompositions"]
nsv
- Number of singular values/vectors to computesubspacedims
- Number of dimensions used for the subspace approximating the action ofA
.niter
- Number of iterations. In each iteration, one multiplication ofA
with a matrix and one multiplication ofA'
with a matrix will be performed.stabilize_sign
- If true, handles the problem that the SVD is only unique up to the sign of each component (for real matrices), by ensuring that the l1 norm of the positive entires for each column in U is larger than the l1 norm of the negative entries.seed
- Use a random seed to init therng
. NB: This requires the packageStableRNGs
to be loaded.rng
- Specify a custom RNG.
SingleCellProjections.load10x
— Methodload10x(filename; lazy=false, var_id=nothing, var_id_delim='_')
Load a CellRanger ".h5" or ".mtx[.gz]" file as a DataMatrix.
lazy
- Iftrue
, the count matrix itself will not be loaded, only features and barcodes. This is used internally inload_counts
to merge samples more efficiently. Useload_counts
to later load the count data.var_id
- If a pairvar_id_col=>cols
, the contents of columnscols
will be merged to create new IDs. Useful to ensure that IDs are unique.var_id_delim
- Delimiter used to when merging variable columns to create the variable id column.
Examples
Load counts from a CellRanger ".h5" file. (Recommended.)
julia> counts = load10x("filtered_feature_bc_matrix.h5")
Load counts from a CellRanger ".mtx" file. Tries to find barcode and feature annotation files in the same folder.
julia> counts = load10x("matrix.mtx.gz")
Lazy loading followed by loading.
julia> counts = load10x("filtered_feature_bc_matrix.h5");
julia> counts = load_counts(counts)
See also: load_counts
SingleCellProjections.load_counts
— Methodload_counts([loadfun=load10x], filenames;
sample_names,
sample_name_col,
obs_id_col = "cell_id",
lazy,
lazy_merge = false,
obs_id_delim = '_',
obs_id_prefixes = sample_names,
extra_var_id_cols::Union{Nothing,String,Vector{String}},
duplicate_var,
duplicate_obs,
callback=nothing)
Load and merge multiple samples efficiently.
Defaults to loading 10x CellRanger files. The files are first loaded lazily, then the merged count matrix is allocated and finally each sample is loaded directly into the merged count matrix. (This strategy greatly reduces memory usage, since only one copy of data is needed instead of two.)
filenames
specifies which files to load. (It can be a vector of filenames or a single filename string.) For each file, loadfun
is called.
sample_names
- Specify the sample names. Should be a vector of the same length asfilenames
. Set tonothing
to not create a sample name annotation.sample_name_col
- Column for sample names inobs
, defaults to "sampleName".obs_id_col
- Colum for mergedid
s inobs
.lazy
- Enable lazy loading. Defaults to true ifload10x
is used, andfalse
otherwise.lazy_merge
- Enable lazy merging, i.e.var
andobs
are created, but the count matrix merging is postponed until a second call toload_counts
.obs_id_delim
- Delimiter used when creating mergedobs
IDs.obs_id_prefixes
- Prefix (one per sample) used to create new IDs. Set to nothing to keep old IDs. Defaults tosample_names
.extra_var_id_cols
- Additional columns to use to ensure variable IDs are unique during merging. Defaults to "feature_type" if that column is present for all samples. Can be aVector{String}
to include multiple columns. Set to nothing to disable.duplicate_var
- Set to:ignore
,:warn
or:error
to decide what happens if duplicate var IDs are found.duplicate_obs
- Set to:ignore
,:warn
or:error
to decide what happens if duplicate obs IDs are found.callback
- Experimental callback functionality. The callback function is called between samples during merging. Returntrue
to abort loading andfalse
to continue.- Additional kwargs (including
duplicate_var
/duplicate_obs
if specified) are passed toloadfun
.
Examples
Load and name samples:
julia> counts = load_counts(["s1.h5", "s2.h5"]; sample_names=["Sample A", "Sample B"])
See also: load10x
, merge_counts
SingleCellProjections.load_counts
— Methodload_counts(data::DataMatrix{<:Lazy10xMatrix})
Load counts for a lazily loaded 10x DataMatrix.
See also: load10x
SingleCellProjections.load_counts
— Methodload_counts(data::DataMatrix{<:LazyMergedMatrix})
Merge/load counts for a lazily merged DataMatrix.
SingleCellProjections.loadh5ad
— Methodloadh5ad(filename; var_id_column=:id, obs_id_column=:id)
Experimental loading of .h5ad files.
This function is deprecated. Load Muon.jl
and see help for create_datamatrix
.
SingleCellProjections.local_outlier_factor!
— Methodlocal_outlier_factor!(data::DataMatrix, full::DataMatrix; k=10, col="LOF")
Compute the Local Outlier Factor (LOF) for each observation in data
, and add as column to data.obs
with the name col
.
When working with projected DataMatrices
, use local_outlier_factor_projection!
instead.
NB: This function might be very slow for high values of k
.
First, the k
nearest neighbors are found for each observation in data
. Then, the Local Outlier Factor is computed by considering the distance between the neighbors, but this time in the full
DataMatrix. Thus full
must have the same observations as are present in data
.
A LOF value smaller than or close to one is means that the observation is an inlier, but a LOF value much larger than one means that the observation is an inlier.
By specifiying full=data
, this is coincides with the standard definition for Local Outlier Factor. However, it is perhaps more useful to find neighbors in a dimension reduced space (after e.g. svd
(PCA) or umap
), but then compute distances in the high dimensional space (typically after normalization). This way, an observation is concidered an outlier if the reduction to a lower dimensional space didn't properly represent the neighborhood of the observation.
The interface is not yet fully decided and subject to change.
Examples
Compute the Local Outlier Factor, with nearest neighbors based only on reduced
, but later using distances in full
for the actual LOF computation.
julia> reduced = svd(normalized; nsv=10)
julia> local_outlier_factor!(reduced, normalized; k=10)
See also: local_outlier_factor
, local_outlier_factor_table
, local_outlier_factor_projection!
SingleCellProjections.local_outlier_factor
— Methodlocal_outlier_factor(data::DataMatrix, full::DataMatrix; k=10, col="LOF", matrix=:keep, var=:copy)
See local_outlier_factor!
for documentation. This version does not modify data
in place.
See also: local_outlier_factor!
, local_outlier_factor_table
, local_outlier_factor_projection
SingleCellProjections.local_outlier_factor_projection!
— Methodlocal_outlier_factor_projection!(data::DataMatrix, full::DataMatrix, base::DataMatrix, base_full::DataMatrix; k=10, col="LOF_projection")
Compute the Local Outlier Factor (LOF) for each observation in data
, and add as column to data.obs
with the name col
.
Use local_outlier_factor_projection!
if you are working with projected data, and local_outlier_factor!
otherwise.
Parameters:
data
- ADataMatrix
for which we compute LOF for each observation. Expected to be aDataMatrix
projected ontobase
, so that thedata
andbase
use the same coordinate system.full
- ADataMatrix
with the same observations asdata
, used to compute distances in the LOF computation. Expected to be aDataMatrix
projected ontobase_full
, so that thefull
andbase_full
use the same coordinate system.base
- The baseDataMatrix
.base_full
- The baseDataMatrix
.k
- The number of nearest neighbors to use. NB: This function might be very slow for high values ofk
.
First, for each observation in data
, the k
nearest neighbors in base
are found. Then, the distance to each neighbor is computed using full
and base_full
. Thus full
must have the same observations as are present in data
, and base_full
must have the same observations as base
.
A LOF value smaller than or close to one is means that the observation is an inlier, but a LOF value much larger than one means that the observation is an inlier.
By specifiying full=data
and base_full=base
, this is coincides with the standard definition for Local Outlier Factor. However, it is perhaps more useful to find neighbors in a dimension reduced space (after e.g. svd
(PCA) or umap
), but then compute distances in the high dimensional space (typically after normalization). This way, an observation is concidered an outlier if the reduction to a lower dimensional space didn't properly represent the neighborhood of the observation.
The interface is not yet fully decided and subject to change.
Examples
Compute the Local Outlier Factor (LOF) for each observation in a data set reduced
, which has been projected onto base_reduced
.
The nearest neighbors are computed between observations in reduced
and base_reduced
, but the distances in the actual LOF computation are between the same observations in normalized
and base_normalized
.
julia> base_reduced = svd(base_normalized; nsv=10)
julia> normalized = project(counts, base_normalized);
julia> reduced = project(normalized, base_reduced);
julia> local_outlier_factor!(reduced, normalized, base_reduced, base_normalized; k=10)
See also: local_outlier_factor_projection
, local_outlier_factor_projection_table
, local_outlier_factor!
SingleCellProjections.local_outlier_factor_projection
— Methodlocal_outlier_factor_projection(data::DataMatrix, full::DataMatrix, base::DataMatrix, base_full::DataMatrix; k=10, col="LOF_projection", matrix=:keep, var=:copy)
See local_outlier_factor_projection!
for documentation. This version does not modify data
in place.
See also: local_outlier_factor_projection!
, local_outlier_factor_projection_table
, local_outlier_factor
SingleCellProjections.local_outlier_factor_projection_table
— Methodlocal_outlier_factor_projection_table(data::DataMatrix, full::DataMatrix, base::DataMatrix, base_full::DataMatrix; k=10, col="LOF_projection", matrix=:keep, var=:copy)
See local_outlier_factor_projection!
for documentation. This returns a DataFrame with observation IDs and a column col
with LOF values for the projection.
See also: local_outlier_factor_projection!
, local_outlier_factor_projection
, local_outlier_factor_table
SingleCellProjections.local_outlier_factor_table
— Methodlocal_outlier_factor_table(data::DataMatrix, full::DataMatrix; k=10, col="LOF")
See local_outlier_factor!
for documentation. This returns a DataFrame with observation IDs and a column col
with LOF values.
See also: local_outlier_factor!
, local_outlier_factor
, local_outlier_factor_projection_table
SingleCellProjections.logtransform
— Methodlogtransform([T=Float64], counts::DataMatrix;
var_filter = hasproperty(counts.var, "feature_type") ? "feature_type" => isequal("Gene Expression") : nothing,
var_filter_cols = hasproperty(counts.var, "feature_type") ? "feature_type" : nothing,
scale_factor=10_000,
var=:copy,
obs=:copy)
Log₂-transform counts
using the formula:
log₂(1 + cᵢⱼ*scale_factor/(∑ᵢcᵢⱼ))
Optionally, T
can be specified to control the eltype
of the sparse transformed matrix. T=Float32
can be used to lower the memory usage, with little impact on the results, since downstream analysis is still done with Float64.
var_filter
- Control which variables (features) to use for parameter estimation. Defaults to"feature_type" => isequal("Gene Expression")
, if afeature_type
column is present incounts.var
. Can be set tonothing
to disable filtering. SeeDataFrames.filter
for how to specify filters.var_filter_cols
- Additional columns used to ensure features are unique. Defaults to "feature_type" if present incounts.var
. Use a Tuple/Vector for specifying multiple columns. Can be set tonothing
to not include any additional columns.var
- Can be:copy
(make a copy of sourcevar
) or:keep
(share the sourcevar
object).obs
- Can be:copy
(make a copy of sourceobs
) or:keep
(share the sourceobs
object).
Examples
julia> transformed = logtransform(counts)
Use eltype Float32 to lower memory usage:
julia> transformed = logtransform(Float32, counts)
See also: sctransform
SingleCellProjections.mannwhitney!
— Methodmannwhitney!(data::DataMatrix, column, [group_a, group_b]; kwargs...)
Perform a Mann-Whitney U-test (also known as a Wilcoxon rank-sum test) between two groups of observations. The U statistic is corrected for ties, and p-values are computed using a normal approximation.
Note that data
must be a DataMatrix
containing a sparse matrix only. It is recommended to first logtransform
(or tf_idf_transform
) the raw counts before performing the Mann-Whitney U-test.
mannwhitney!
adds a U statistic and a p-value column to data.var
. See mannwhitney_table
for more details on groups and kwargs.
In addition mannwhitney!
supports the kwarg
:
prefix
- Output column names for U statistics and p-values will be prefixed with this string. If none is given, it will be constructed fromcolumn
,group_a
andgroup_b
.
See also: mannwhitney_table
, mannwhitney
SingleCellProjections.mannwhitney
— Methodmannwhitney(data::DataMatrix, column, [group_a, group_b]; var=:copy, obs=:copy, matrix=:keep, kwargs...)
Perform a Mann-Whitney U-test (also known as a Wilcoxon rank-sum test) between two groups of observations. The U statistic is corrected for ties, and p-values are computed using a normal approximation.
Note that data
must be a DataMatrix
containing a sparse matrix only. It is recommended to first logtransform
(or tf_idf_transform
) the raw counts before performing the Mann-Whitney U-test.
mannwhitney
creates a copy of data
and adds a U statistic and a p-value column to data.var
. See mannwhitney!
and mannwhitney_table
for more details on groups and kwargs
.
See also: mannwhitney!
, mannwhitney_table
SingleCellProjections.mannwhitney_table
— Methodmannwhitney_table(data::DataMatrix, column, [group_a, group_b]; kwargs...)
Perform a Mann-Whitney U-test (also known as a Wilcoxon rank-sum test) between two groups of observations. The U statistic is corrected for ties, and p-values are computed using a normal approximation.
Note that data
must be a DataMatrix
containing a sparse matrix only. It is recommended to first logtransform
(or tf_idf_transform
) the raw counts before performing the Mann-Whitney U-test.
column
specifies a column in data.obs
and is used to determine which observations belong in which group.
If group_a
and group_b
are not given, the column
must contain exactly two unique values (except missing
). If group_a
is given, but not group_b
, the observations in group A are compared to all other observations (except missing
). If both group_a
and group_b
are given, the observations in group A are compared the observations in group B.
mannwhitney_table
returns a Dataframe with columns for variable IDs, U statistics and p-values.
Supported kwargs
are:
statistic_col="U"
- Name of the output column containing the U statistics. (Set to nothing to remove from output.)pvalue_col="pValue"
- Name of the output column containing the p-values. (Set to nothing to remove from output.)h1_missing=:skip
- One of:skip
and:error
. Ifskip
, missing values incolumn
are skipped, otherwise an error is thrown.
The following kwargs
determine how the computations are threaded:
nworkers
- Number of worker threads used in the computation. Set to 1 to disable threading.chunk_size
- Number of variables processed in each chunk.channel_size
- Max number of unprocessed chunks in queue.
See also: mannwhitney!
, mannwhitney
SingleCellProjections.merge_counts
— Methodmerge_counts(samples, sample_names;
lazy=false,
sample_name_col = sample_names===nothing ? nothing : "sampleName",
obs_id_col = "cell_id",
obs_id_delim = '_',
obs_id_prefixes = sample_names,
extra_var_id_cols::Union{Nothing,String,Vector{String}},
duplicate_var,
duplicate_obs,
callback=nothing)
Merge samples
to create one large DataMatrix, by concatenating the obs
. The union of the variables from the samples is used, and if a variable is not present in a sample, the count will be set to zero.
The obs
IDs are created by concatenating the current obs
ID columns, together with the sample_names
(if provided).
lazy
- Lazy merging. Useload_counts
to actually perform the merging.sample_name_col
- Column in which thesample_names
are stored.obs_id_col
- Name ofobs
ID column after merging. (Set to nothing to keep old column name.)obs_id_delim
- Delimiter used when mergingobs
IDs.obs_id_prefixes
- Prefix (one per sample) used to create new IDs. Set to nothing to keep old IDs. Defaults tosample_names
.extra_var_id_cols
- Additional columns to use to ensure variable IDs are unique during merging. Defaults to "feature_type" if that column is present for all samples. Can be aVector{String}
to include multiple columns. Set to nothing to disable.duplicate_var
- Set to:ignore
,:warn
or:error
to decide what happens if duplicate var IDs are found.duplicate_obs
- Set to:ignore
,:warn
or:error
to decide what happens if duplicate obs IDs are found.callback
- Experimental callback functionality. The callback function is called between samples during merging. Returntrue
to abort loading andfalse
to continue.
See also: load_counts
SingleCellProjections.normalize_matrix
— Methodnormalize_matrix(data::DataMatrix, design::DesignMatrix; scale=false, kwargs...)
Normalize data
using the specified design
matrix.
See also: NormalizationModel
, designmatrix
SingleCellProjections.normalize_matrix
— Methodnormalize_matrix(data::DataMatrix, [covariates...]; center=true, scale=false, kwargs...)
Normalize data
. By default, the matrix is centered. Any covariates
specified (using column names of data.obs
) will be regressed out.
center
- Set to true to center the data matrix.scale
- Set to true to scale the variables in the data matrix to unit standard deviation.
For other kwargs
and more detailed descriptions, see NormalizationModel
and designmatrix
.
Examples
Centering only:
julia> normalize_matrix(data)
Regression model with intercept (centering) and "fraction_mt" (numerical annotation):
julia> normalize_matrix(data, "fraction_mt")
As above, but also including "batch" (categorical annotation):
julia> normalize_matrix(data, "fraction_mt", "batch")
See also: NormalizationModel
, designmatrix
SingleCellProjections.obs_coordinates
— Functionobs_coordinates(data::DataMatrix)
Returns a matrix with coordinates for the observations. Not available for all types of DataMatrices. Mostly useful for data matrices after dimension reduction such as svd
or force_layout
has been applied.
In the case of SVD
(PCA), obs_coordinates
returns the principal components, scaled by the singular values. This is a a good starting point for downstream analysis, since it is the optimal linear approximation of the original data for the given number of dimensions.
SingleCellProjections.project
— Methodproject(data::DataMatrix, models, args...; verbose=true, kwargs...)
Convenience function for projection onto multiple models
. Essentially calls foldl
and prints some @info
messages (if verbose=true
). In most cases, it is better to call project(data, base::DataMatrix)
instead of using this method directly.
SingleCellProjections.project
— Methodproject(data::DataMatrix, base::DataMatrix, args...; from=nothing, kwargs...)
Project data
onto base
, by applying ProjectionModels from base
one by one.
Since data
already might have some models applied, project
will try to figure out which models from base
to use. See "Examples" below for concrete examples. Here's a more technical overview:
Consider a base
data matrix with four models:
base: A -> B -> C -> D
Given some new data
(typically counts), we can project that onto base
, given the result proj
by applying all four models:
data:
proj: A -> B -> C -> D
If data
already has some models applied (e.g. we already projected onto A and B above), project
will look for the last model in data
(in this case B) in the list of models in base
, and only apply models after that (in this case C and D).
data: A -> B
proj: A -> B -> C -> D
It is also possible to use the from
kwarg to specify exactly which models to apply. (The models in from
must be a prefix of the models in base
, or in other words, base
was created by applying additional operations to from
.)
data: X
base: A -> B -> C -> D
from: A -> B
proj: X -> C -> D
Note that it is necessary to use the from
kwarg if the last model in data
does not occurr in base
, because project
cannot figure out on its own which models it makes sense to apply.
Examples
First, we construct a "base" by loading counts, SCTransforming, normalizing, computing the svd and finally computing a force layout:
julia> fp = ["GSE164378_RNA_ADT_3P_P1.h5", "GSE164378_RNA_ADT_3P_P2.h5"];
julia> counts = load_counts(fp; sample_names=["P1","P2"]);
julia> transformed = sctransform(counts);
julia> normalized = normalize_matrix(transformed);
julia> reduced = svd(normalized; nsv=10);
julia> fl = force_layout(reduced; ndim=3, k=100)
DataMatrix (3 variables and 35340 observations)
Matrix{Float64}
Variables: id
Observations: id, sampleName, barcode
Models: NearestNeighborModel(base="force_layout", k=10), SVD, Normalization, SCTransform
Note how the last line lists all ProjectionModels
used in the creation of fl
.
Next, let's load some more samples for projection:
julia> fp2 = ["GSE164378_RNA_ADT_3P_P5.h5", "GSE164378_RNA_ADT_3P_P6.h5"];
julia> counts2 = load_counts(fp2; sample_names=["P5","P6"]);
It is easy to project the newly loaded counts2
onto the "base" force layout fl
:
julia> project(counts2, fl)
DataMatrix (3 variables and 42553 observations)
Matrix{Float64}
Variables: id
Observations: id, sampleName, barcode
Models: NearestNeighborModel(base="force_layout", k=10), SVD, Normalization, SCTransform
We can also project in two or more steps, to get access to intermediate results:
julia> reduced2 = project(counts2, reduced)
DataMatrix (20239 variables and 42553 observations)
SVD (10 dimensions)
Variables: id, feature_type, name, genome, read, pattern, sequence, logGeneMean, outlier, beta0, ...
Observations: id, sampleName, barcode
Models: SVDModel(nsv=10), Normalization, SCTransform
julia> project(reduced2, fl)
DataMatrix (3 variables and 42553 observations)
Matrix{Float64}
Variables: id
Observations: id, sampleName, barcode
Models: NearestNeighborModel(base="force_layout", k=10), SVD, Normalization, SCTransform
If the DataMatrix we want to project is modified, we need to use the from
kwarg to tell project
which models to use:
julia> filtered = counts2[:,1:10:end]
DataMatrix (33766 variables and 4256 observations)
SparseArrays.SparseMatrixCSC{Int64, Int32}
Variables: id, feature_type, name, genome, read, pattern, sequence
Observations: id, sampleName, barcode
Models: FilterModel(:, 1:10:42551)
julia> reduced2b = project(filtered2, reduced; from=counts)
DataMatrix (20239 variables and 4256 observations)
SVD (10 dimensions)
Variables: id, feature_type, name, genome, read, pattern, sequence, logGeneMean, outlier, beta0, ...
Observations: id, sampleName, barcode
Models: SVDModel(nsv=10), Normalization, SCTransform, Filter
After that, it is possible to continue without specifying from
:
julia> project(reduced2b, fl)
DataMatrix (3 variables and 4256 observations)
Matrix{Float64}
Variables: id
Observations: id, sampleName, barcode
Models: NearestNeighborModel(base="force_layout", k=10), SVD, Normalization, SCTransform, Filter
SingleCellProjections.project
— Methodproject(data::DataMatrix, model::ProjectionModel, args...; verbose=true, kwargs...)
Core projection function. Project data
based on the single ProjectionModel
model
. In most cases, it is better to call project(data, base::DataMatrix)
instead of using this method directly.
SingleCellProjections.pseudobulk
— Methodpseudobulk(data::DataMatrix, obs_col, [additional_columns...]; var=:copy)
Create a new DataMatrix
by averging over groups, as specified by the categorical annotation obs_col
(and optionally additional columns).
var
- Can be:copy
(make a copy of sourcevar
) or:keep
(share the sourcevar
object).
Examples
Create a pseudobulk representation of each sample:
julia> pseudobulk(transformed, "sampleName")
Create a pseudobulk representation for each celltype in each sample:
julia> pseudobulk(transformed, "sampleName", "celltype")
SingleCellProjections.set_obs_id_col!
— Methodset_obs_id_col!(data::DataMatrix, obs_id_col::String; duplicate_obs=:error)
Set which column to use as observation IDs. It will be moved to the first column of data.obs
. The rows of this column in data.obs
must be unique.
duplicate_obs
- Set to :ignore, :warn or :error to decide what happens if duplicate IDs are found.
SingleCellProjections.set_var_id_col!
— Methodset_var_id_col!(data::DataMatrix, var_id_col::String; duplicate_var=:error)
Set which column to use as variable IDs. It will be moved to the first column of data.var
. The rows of this column in data.var
must be unique.
duplicate_var
- Set to :ignore, :warn or :error to decide what happens if duplicate IDs are found.
SingleCellProjections.splitrange
— Methodsplitrange(r::UnitRange, nparts::Integer)
Splits a range in nparts
number of parts of equal length.
SingleCellProjections.tf_idf_transform
— Methodtf_idf_transform([T=Float64], counts::DataMatrix;
var_filter = hasproperty(counts.var, "feature_type") ? "feature_type" => isequal("Gene Expression") : nothing,
var_filter_cols = hasproperty(counts.var, "feature_type") ? "feature_type" : nothing,
scale_factor = 10_000,
idf = vec(size(counts,2) ./ max.(1,sum(counts.matrix; dims=2))),
annotate = true,
var = :copy,
obs = :copy)
Compute the TF-IDF (term frequency-inverse document frequency) transform of counts
, using the formula log( 1 + scale_factor * tf * idf )
where tf
is the term frequency counts.matrix ./ max.(1, sum(counts.matrix; dims=1))
.
Optionally, T
can be specified to control the eltype
of the sparse transformed matrix. T=Float32
can be used to lower the memory usage, with little impact on the results, since downstream analysis is still done with Float64.
var_filter
- Control which variables (features) to use for parameter estimation. Defaults to"feature_type" => isequal("Gene Expression")
, if afeature_type
column is present incounts.var
. Can be set tonothing
to disable filtering. SeeDataFrames.filter
for how to specify filters.var_filter_cols
- Additional columns used to ensure features are unique. Defaults to "feature_type" if present incounts.var
. Use a Tuple/Vector for specifying multiple columns. Can be set tonothing
to not include any additional columns.annotate
- If true,idf
will be added as avar
annotation.var
- Can be:copy
(make a copy of sourcevar
) or:keep
(share the sourcevar
object).obs
- Can be:copy
(make a copy of sourceobs
) or:keep
(share the sourceobs
object).
SingleCellProjections.ttest!
— Methodttest!(data::DataMatrix, h1, [group_a], [group_b]; h0, kwargs...)
Performs a t-Test with the given h1
(alternative hypothesis) and h0
(null hypothesis). Examples of t-Tests are Two-Group tests and Linear Regression.
ttest!
adds a t-statistic, a p-value and a difference column to data.var
.
See ttest_table
for usage examples and more details on computations and parameters.
In addition ttest!
supports the kwarg
:
prefix
- Output column names for t-statistics, p-values and differences will be prefixed with this string. If none is given, it will be constructed fromh1
,group_a
,group_b
andh0
.
See also: ttest_table
, ttest
, ftest!
, mannwhitney!
SingleCellProjections.ttest
— Methodttest(data::DataMatrix, h1, [group_a], [group_b]; h0, var=:copy, obs=:copy, matrix=:keep, kwargs...)
Performs a t-Test with the given h1
(alternative hypothesis) and h0
(null hypothesis). Examples of t-Tests are Two-Group tests and Linear Regression.
ttest
creates a copy of data
and adds a t-statistic, a p-value and a difference column to data.var
.
See ttest_table
and ttest!
for usage examples and more details on computations and parameters.
See also: ttest!
, ttest_table
, ftest
, mannwhitney
SingleCellProjections.ttest_table
— Methodttest_table(data::DataMatrix, h1, [group_a], [group_b]; h0, kwargs...)
Performs a t-Test with the given h1
(alternative hypothesis) and h0
(null hypothesis). Examples of t-Tests are Two-Group tests and Linear Regression.
T-tests can be performed on any DataMatrix
, but it is almost always recommended to do it directly after transforming the data using e.g. sctransform
, logtransform
or tf_idf_transform
.
Do not use ttest_table
after normalizing the data using normalize_matrix
: ttest_table
needs to know about the h0
model (regressed out covariates) for correction computations. Failing to do so can result in incorrect results. If you want to correct for the same covariates, pass them as h0
to ttest_table
.
h1
can be:
- A string specifying a column name of
data.obs
. Auto-detection determines if the column is categorical (Two-Group) or numerical (linear regression).- If
group_a
andgroup_b
are specified, a Two-Group test betweengroup_a
andgroup_b
is performed. - If
group_a
is specified, but notgroup_b
, a Two-Group test betweengroup_a
and all other observations is performed.
- If
- A
covariate
for more control of how to interpret the values in the column.
ttest_table
returns a Dataframe with columns for variable IDs, t-statistics, p-values and differences. For Two-group tests, difference
is the difference in mean between the two groups. For linear regression, the difference corresponds to the rate of change.
Supported kwargs
are:
h0
- Use a non-trivialh0
(null) model. Specified in the same way ash1
above.center=true
- Add an intercept to theh0
(null) model.statistic_col="t"
- Name of the output column containing the t-statistics. (Set to nothing to remove from output.)pvalue_col="pValue"
- Name of the output column containing the p-values. (Set to nothing to remove from output.)difference_col="difference"
- Name of the output column containing the differences. (Set to nothing to remove from output.)h1_missing=:skip
- One of:skip
and:error
. Ifskip
, missing values inh1
columns are skipped, otherwise an error is thrown.h0_missing=:error
- One of:skip
and:error
. Ifskip
, missing values inh0
columns are skipped, otherwise an error is thrown.allow_normalized_matrix=false
- Set to true to accept running on aDataMatrix
that has been normalized.
Examples
Perform a Two-Group t-test between celltypes "Mono" and "DC".
julia> ttest_table(transformed, "celltype", "Mono", "DC")
Perform a Two-Group t-test between celltype "Mono" and all other cells.
julia> ttest_table(transformed, "celltype", "Mono")
Perform a Two-Group t-test between celltypes "Mono" and "DC", while correcting for "fraction_mt" (a linear covariate).
julia> ttest_table(transformed, "celltype", "Mono", "DC")
Perform Linear Regression using the covariate "fraction_mt".
julia> ttest_table(transformed, "fraction_mt")
See also: ttest!
, ttest
, ftest_table
, mannwhitney_table
, covariate
SingleCellProjections.update_matrix
— Functionupdate_matrix(data::DataMatrix, matrix, model=nothing;
var::Union{Symbol,String,DataFrame} = "",
obs::Union{Symbol,String,DataFrame} = "")
Create a new DataMatrix
by replacing parts of data
with new values. Mostly useful when implementing new ProjectionModel
s.
matrix
- the new matrix.model
- will be appended to the list of models fromdata
. If set tonothing
, the resulting list ofmodels
will be empty.
Kwargs:
var
- One of::copy
- Copy fromdata
.:keep
- Sharevar
withdata
.::DataFrame
- Replace with a new table with variable annotations.prefix::String
- Prefix, the new variables will be named prefix1, prefix2, etc.
obs
Seevar
.
SingleCellProjections.ustatistic_single
— Methodustatistic_single(X, j, groups, n1, n2)
NB: Assumes all sparse non-zeros are positive.
X
is a sparse matrix where each column is a variable. j
is the current variable. groups
is a vector with values: 1
for each sample in group 1, 2
for each sample in group 2 and 0
for samples in neither group. n1
number of elements in group 1 (precomputed from groups
) n2
number of elements in group 2 (precomputed from groups
)
SingleCellProjections.var_coordinates
— Functionvar_coordinates(data::DataMatrix)
Returns a matrix with coordinates for the variables. Only available for DataMatrices that have a dual representation (e.g. SVD/PCA).
In the case of SVD
(PCA), var_coordinates
returns the principal components as unit vectors.
SingleCellProjections.var_counts_fraction!
— Methodvar_counts_fraction!(counts::DataMatrix, sub_filter, tot_filter, col; check=true, var=:keep, obs=:keep)
For each observation, compute the fraction of counts that match a specific variable pattern.
sub_filter
decides which variables are counted.tot_filter
decides which variables to include in the total.
kwargs:
var
- Use this to setvar
in theProjectionModel
.obs
- Use this to setobs
in theProjectionModel
. Note thatcounts.obs
is changed in place, regardless of the value ofobs
.
If check=true
, an error will be thrown if no variables match the patterns.
For more information on filtering syntax, see examples below and the documentation on DataFrames.filter
.
Examples
Compute the fraction of reads in MT- genes, considering only "Gene Expression" features (and not e.g. "Antibody Capture").
var_counts_fraction!(counts, "name"=>startswith("MT-"), "feature_type"=>isequal("Gene Expression"), "fraction_mt")
Compute the fraction of reads in MT- genes, when there is no feature_type
annotation (i.e. all variables are genes).
var_counts_fraction!(counts, "name"=>startswith("MT-"), Returns(true), "fraction_mt")
See also: var_counts_fraction
SingleCellProjections.var_counts_fraction
— Methodvar_counts_fraction(counts::DataMatrix, sub_filter, tot_filter, col; check=true, var=:copy, obs=:copy)
For each observation, compute the fraction of counts that match a specific variable pattern.
sub_filter
decides which variables are counted.tot_filter
decides which variables to include in the total.
kwargs:
var
- Can be:copy
(make a copy of sourcevar
) or:keep
(share the sourcevar
object).obs
- Can be:copy
(make a copy of sourceobs
) or:keep
(share the sourceobs
object).
If check=true
, an error will be thrown if no variables match the patterns.
For more information on filtering syntax, see examples below and the documentation on DataFrames.filter
.
Examples
Compute the fraction of reads in MT- genes, considering only "Gene Expression" features (and not e.g. "Antibody Capture").
var_counts_fraction(counts, "name"=>startswith("MT-"), "feature_type"=>isequal("Gene Expression"), "fraction_mt")
Compute the fraction of reads in MT- genes, when there is no feature_type
annotation (i.e. all variables are genes).
var_counts_fraction(counts, "name"=>startswith("MT-"), Returns(true), "fraction_mt")
See also: var_counts_fraction!
SingleCellProjections.var_counts_sum!
— Methodvar_counts_sum!([f=identity], counts::DataMatrix, filter, col; check=true, var=:keep, obs=:keep)
For each observation, compute the sum of counts matching the filter
.
If f
is specified, it is applied to each element before summing. (Similar to sum
.)
kwargs:
var
- Use this to setvar
in theProjectionModel
.obs
- Use this to setobs
in theProjectionModel
. Note thatcounts.obs
is changed in place, regardless of the value ofobs
.
If check=true
, an error will be thrown if no variables match the pattern.
For more information on filtering syntax, see examples below and the documentation on DataFrames.filter
.
Examples
Sum all "Gene Expression" counts:
var_counts_sum!(counts, "feature_type"=>isequal("Gene Expression"), "totalRNACount")
Compute the number of "Gene Expression" variables that are expressed (i.e. nonzero):
var_counts_sum!(!iszero, counts, "feature_type"=>isequal("Gene Expression"), "nonzeroRNACount")
See also: var_counts_sum
SingleCellProjections.var_counts_sum
— Methodvar_counts_sum([f=identity], counts::DataMatrix, filter, col; check=true, var=:keep, obs=:keep)
For each observation, compute the sum of counts matching the filter
.
If f
is specified, it is applied to each element before summing. (Similar to sum
.)
kwargs:
var
- Use this to setvar
in theProjectionModel
.obs
- Use this to setobs
in theProjectionModel
. Note thatcounts.obs
is changed in place, regardless of the value ofobs
.
If check=true
, an error will be thrown if no variables match the pattern.
For more information on filtering syntax, see examples below and the documentation on DataFrames.filter
.
Examples
Sum all "Gene Expression" counts:
var_counts_sum(counts, "feature_type"=>isequal("Gene Expression"), "totalRNACount")
Compute the number of "Gene Expression" variables that are expressed (i.e. nonzero):
var_counts_sum(!iszero, counts, "feature_type"=>isequal("Gene Expression"), "nonzeroRNACount")
See also: var_counts_sum!
SingleCellProjections.variable_std
— Methodvariable_std(data::DataMatrix)
Computes the variance of each variable in data
.
data
must be mean-centered. E.g. by using normalize_matrix
before calling variable_std
.
See also: variable_var
, normalize_matrix
SingleCellProjections.variable_var
— Methodvariable_var(data::DataMatrix)
Computes the variance of each variable in data
.
data
must be mean-centered. E.g. by using normalize_matrix
before calling variable_var
.
See also: variable_std
, normalize_matrix