Interface
SingleCellProjections.DataMatrixSingleCellProjections.DataMatrixSingleCellProjections.DataMatrixSingleCellProjections.LowRankSingleCellProjections.NormalizationModelSingleCellProjections.PseudoBulkModelSingleCellProjections.SCTransformModelSingleCellProjections.SVDModelBase.copyLinearAlgebra.svdSCTransform.sctransformSingleCellProjections.adjacency_distancesSingleCellProjections.covariateSingleCellProjections.covariateSingleCellProjections.designmatrixSingleCellProjections.filter_matrixSingleCellProjections.filter_obsSingleCellProjections.filter_varSingleCellProjections.force_layoutSingleCellProjections.ftestSingleCellProjections.ftest!SingleCellProjections.ftest_tableSingleCellProjections.implicitsvdSingleCellProjections.load10xSingleCellProjections.load_countsSingleCellProjections.load_countsSingleCellProjections.load_countsSingleCellProjections.loadh5adSingleCellProjections.local_outlier_factorSingleCellProjections.local_outlier_factor!SingleCellProjections.local_outlier_factor_projectionSingleCellProjections.local_outlier_factor_projection!SingleCellProjections.local_outlier_factor_projection_tableSingleCellProjections.local_outlier_factor_tableSingleCellProjections.logtransformSingleCellProjections.mannwhitneySingleCellProjections.mannwhitney!SingleCellProjections.mannwhitney_tableSingleCellProjections.merge_countsSingleCellProjections.normalize_matrixSingleCellProjections.normalize_matrixSingleCellProjections.obs_coordinatesSingleCellProjections.projectSingleCellProjections.projectSingleCellProjections.projectSingleCellProjections.pseudobulkSingleCellProjections.set_obs_id_col!SingleCellProjections.set_var_id_col!SingleCellProjections.splitrangeSingleCellProjections.tf_idf_transformSingleCellProjections.ttestSingleCellProjections.ttest!SingleCellProjections.ttest_tableSingleCellProjections.update_matrixSingleCellProjections.ustatistic_singleSingleCellProjections.var_coordinatesSingleCellProjections.var_counts_fractionSingleCellProjections.var_counts_fraction!SingleCellProjections.var_counts_sumSingleCellProjections.var_counts_sum!SingleCellProjections.variable_stdSingleCellProjections.variable_var
SingleCellProjections.DataMatrix — Typestruct DataMatrix{T,Tv,To}A DataMatrix represents a matrix together with annotations for variables and observations.
Fields:
matrix::T- The matrix.var::Tv- Variable annotations.obs::To- Observation annotations.models::Vector{ProjectionModel}- Models used in the creation of thisDataMatrix.
The first column of the var and obs tables should contain unique IDs.
SingleCellProjections.DataMatrix — MethodDataMatrix(matrix, var, obs; kwargs...)Create a DataMatrix with the given matrix, var and obs.
The first column of var/obs are used as IDs.
Kwargs:
duplicate_var- Set to:ignore,:warnor:errorto decide what happens if duplicate var IDs are found.duplicate_obs- Set to:ignore,:warnor:errorto decide what happens if duplicate obs IDs are found.
SingleCellProjections.DataMatrix — MethodDataMatrix()Create an empty DataMatrix{Matrix{Float64},DataFrame,DataFrame}.
SingleCellProjections.LowRank — TypeLowRankA matrix decomposition UVᵀ where each row of U represents a variable and each column of Vᵀ represents a sample. Intended for situations where the product is low rank, i.e. size(U,2)==size(Vt,1) is small.
SingleCellProjections.NormalizationModel — MethodNormalizationModel(data::DataMatrix, design::DesignMatrix;
scale=false, min_std=1e-6, annotate=true,
rtol=sqrt(eps()), var=:copy, obs=:copy)Create a NormalizationModel based on data and a design matrix.
scale- Set to true to normalize variables to unit standard deviation. Can also be set to a vector with a scaling factor for each variable.min_std- Ifscale==true, thescalevector is set to1.0 ./ max.(std, min_std). That is,min_stdis used to suppress variables that are very small (and any fluctuations can be assumed to be noise).annotate- Only used ifscale!=false. Withannotate=true, thescalevector is added as a var annotation.rtol- Singular values of the design matrix that are≤rtolare discarded. Needed for numerical stability.var- Can be:copy(make a copy of sourcevar) or:keep(share the sourcevarobject).obs- Can be:copy(make a copy of sourceobs) or:keep(share the sourceobsobject).
See also: normalize_matrix, designmatrix
SingleCellProjections.PseudoBulkModel — TypePseudoBulkModel <: ProjectionModelA model used for computing a "pseudo-bulk" representation of a DataMatrix.
See also: pseudobulk
SingleCellProjections.SCTransformModel — MethodSCTransformModel([T=Float64], counts::DataMatrix;
var_filter = hasproperty(counts.var, :feature_type) ? :feature_type => isequal("Gene Expression") : nothing,
rtol=1e-3, atol=0.0, annotate=true,
post_var_filter=:, post_obs_filter=:,
obs=:copy,
kwargs...)Computes the SCTransform parameter estimates for counts and creates a SCTransformModel that can be applied to the same or another data set. Defaults to only using "Gene Expression" features.
Optionally, T can be specified to control the eltype of the sparse transformed matrix. T=Float32 can be used to lower the memory usage, with little impact on the results, since downstream analysis is still done with Float64.
var_filter- Control which variables (features) to use for parameter estimation. Defaults to"feature_type" => isequal("Gene Expression"), if afeature_typecolumn is present incounts.var. Can be set tonothingto disable filtering. SeeDataFrames.filterfor how to specify filters.var_filter_cols- Additional columns used to ensure features are unique. Defaults to "feature_type" if present incounts.var. Use a Tuple/Vector for specifying multiple columns. Can be set tonothingto not include any additional columns.rtol- Relative tolerance when constructing low rank approximation.atol- Absolute tolerance when constructing low rank approximation.annotate- Set to true to include SCTransform parameter estimates as feature annotations.post_var_filter- Equivalent to applying variable (feature) filtering after sctransform, but computationally more efficient.post_obs_filter- Equivalent to applying observation (cell) filtering after sctransform, but computationally more efficient.obs- Can be:copy(make a copy of sourceobs) or:keep(share the sourceobsobject).kwargs...- Additionalkwargsare passed on toSCTransform.scparams.
Examples
Setup SCTransformModel (Gene Expression features):
julia> SCTransformModel(counts)Setup SCTransformModel (Antibody Capture features):
julia> SCTransformModel(counts; var_filter = :feature_type => isequal("Antibody Capture"))See also: sctransform, SCTransform.scparams, DataFrames.filter
SingleCellProjections.SVDModel — TypeSVDModel <: ProjectionModelA model used for projecting onto an SVD object. Normally created using svd(::DataMatrix).
See also: svd
Base.copy — Methodcopy(data::DataMatrix; var=:copy, obs=:copy, matrix=:keep)Copy DataMatrix data. By default, var and obs annotations are copied, but the matrix is shared. Set kwargs var, obs and matrix to :keep/:copy for fine grained control.
LinearAlgebra.svd — Methodsvd(data::DataMatrix; nsv=3, var=:copy, obs=:copy, kwargs...)Compute the Singular Value Decomposition (SVD) of data using the Random Subspace SVD algorithm from [Halko et al. "Finding structure with randomness: Probabilistic algorithms for constructing approximate matrix decompositions"]. SVD is often used to perform Principal Component Analysis (PCA), which assumes that the data is centered.
nsv- The number of singular values.var- Can be:copy(make a copy of sourcevar) or:keep(share the sourcevarobject).obs- Can be:copy(make a copy of sourceobs) or:keep(share the sourceobsobject).
Additional kwargs related to numerical precision are passed to SingleCellProjections.implicitsvd.
See also: SingleCellProjections.implicitsvd
SCTransform.sctransform — Methodsctransform([T=Float64], counts::DataMatrix; verbose=true, kwargs...)Compute the SCTransform of the DataMatrix counts. The result is stored as a Matrix Expression with the sum of a sparse and a low-rank term. I.e. no large dense matrix is created.
Optionally, T can be specified to control the eltype of the sparse transformed matrix. T=Float32 can be used to lower the memory usage, with little impact on the results, since downstream analysis is still done with Float64.
See SCTransformModel for description of kwargs....
Examples
Compute SCTransform (Gene Expression features):
julia> sctransform(counts)Compute SCTransform (Antibody Capture features):
julia> sctransform(counts; var_filter = :feature_type => isequal("Antibody Capture"))Compute SCTransform (Gene Expression features), using eltype Float32 to lower memory usage:
julia> sctransform(Float32, counts)See also: SCTransformModel, SCTransform.scparams
SingleCellProjections.adjacency_distances — Functionadjacency_distances(adj, X, Y=X)For each structural non-zero in adj, compute the Euclidean distance between the point in the DataMatrix Y and the point in the DataMatrix X.
Can be useful when adj is created using e.g. a lower dimensional representation and we want to know the distances in the original, high dimensional space.
At the moment all points in Y are required to have the same number of neighbors in X, for computation reasons.
SingleCellProjections.covariate — Functioncovariate(src, group_a, [group_b])Create a two-group covariate referring to src, comparing group_a to group_b.
src is one of:
String- referring to a column in the DataMatrixobs.DataFrame- with exactly two columns, the first should contain IDs matching IDs inobs, and the second should be the covariate.Annotations(experimental) - with ID matching the DataMatrixobsand a second column for the covariate.
If src is a String it will refer to a column in the DataMatrix obs. src can also be an Annotations object, with ID matching the DataMatrix obs. group_a and group_b must be values occuring in the column src.
If group_b is not given, group_a will be compared to all other observations.
See also: designmatrix
SingleCellProjections.covariate — Functioncovariate(src, type=:auto)Create a covariate referring to src.
src is one of:
String- referring to a column in the DataMatrixobs.DataFrame- with exactly two columns, the first should contain IDs matching IDs inobs, and the second should be the covariate.Annotations(experimental) - with ID matching the DataMatrixobsand a second column for the covariate.
type must be one of :auto, :numerical, :categorical, :twogroup and :intercept. :auto means auto-detection by checking if the values in the column are numerical or categorical. type==:intercept adds an intercept to the model (in which case the src parameter is ignored).
See also: designmatrix
SingleCellProjections.designmatrix — Methoddesignmatrix(data::DataMatrix, [covariates...]; center=true, max_categories=100)Creates a design matrix from data.obs and the given covariates. Covariates can be specied using strings (column name in data.obs), with autodetection of whether the covariate is numerical or categorical, or using the covariate function for more control.
center- Iftrue, an intercept is added to the design matrix. (Should only be set tofalsein very rare circumstances.)max_categories- Safety parameter, an error will be thrown if there are too many categories. In this case, it is likely a mistake that the covariate was used as a categorical covariate. Using a very large number of categories is also bad for performance and memory consumption.
Examples
Centering only:
julia> designmatrix(data)Regression model with intercept (centering) and "fraction_mt" (numerical annotation):
julia> designmatrix(data, "fraction_mt")As above, but also including "batch" (categorical annotation):
julia> designmatrix(data, "fraction_mt", "batch")See also: normalize_matrix, NormalizationModel, covariate
SingleCellProjections.filter_matrix — Methodfilter_matrix(fvar, fobs, data::DataMatrix)Return a new DataMatrix, containing only the variables and observations passing the filters.
fvar/fobs can be:
- An
AbstractVectorof indices to keep. - A
AbstractVectorof booleans (true to keep, false to discard). :indicating that all variables/observations should be kept.- Anything you can pass on to
DataFrames.filter(see DataFrames documentation for details).
Also note that indexing of a DataMatrix supports AbstractVectors of indices/booleans and :, and is otherwise identical to filter_matrix.
Examples
Keep every 10th variable and 3rd observation:
julia> filter_matrix(1:10:size(data,1), 1:3:size(data,2), data)Or, using indexing syntax:
julia> data[1:10:end, 1:3:end]For more examples, see filter_var and filter_obs.
See also: filter_var, filter_obs, DataFrames.filter
SingleCellProjections.filter_obs — Methodfilter_obs(f, data::DataMatrix)Return a new DataMatrix, containing only the observations passing the filter.
f can be:
- An
AbstractVectorof indices to keep. - A
AbstractVectorof booleans (true to keep, false to discard). :indicating that all observations should be kept.- Anything you can pass on to
DataFrames.filter(see DataFrames documentation for details).
Examples
Keep every 10th observation:
julia> filter_obs(1:10:size(data,2), data)Remove observations where "celltype" equals "other":
julia> filter_obs("celltype"=>!isequal("other"), data)See also: filter_matrix, filter_var, DataFrames.filter
SingleCellProjections.filter_var — Methodfilter_var(f, data::DataMatrix; kwargs...)Return a new DataMatrix, containing only the variables passing the filter.
f can be:
- An
AbstractVectorof indices to keep. - A
AbstractVectorof booleans (true to keep, false to discard). :indicating that all variables should be kept.- Anything you can pass on to
DataFrames.filter(see DataFrames documentation for details).
Examples
Keep every 10th variable:
julia> filter_var(1:10:size(data,1), data)Keep only variables of the type "Gene Expression":
julia> filter_var("feature_type"=>isequal("Gene Expression"), data)See also: filter_matrix, filter_obs, DataFrames.filter
SingleCellProjections.force_layout — Methodforce_layout(data::DataMatrix;
ndim=3,
k,
adj,
kprojection=10,
obs=:copy,
adj_out,
niter = 100,
link_distance = 4,
link_strength = 2,
charge = 5,
charge_min_distance = 1,
theta = 0.9,
center_strength = 0.05,
velocity_decay = 0.9,
initialAlpha = 1.0,
finalAlpha = 1e-3,
initialScale = 10,
seed,
rng)Compute the Force Layout (also known as a force directed knn-graph or SPRING plots) for data. Usually, data is a DataMatrix after reduction to 10-100 dimensions by svd.
A Force Layout is computed by running a physics simulation were the observations are connected by springs (such that connected observations are attracted), a general "charge" force repelling all observations from each other and a centering force that keep the observations around the origin. The implementation is based on d3-force: https://github.com/d3/d3-force, also see LICENSE.md.
Exactly one of the kwargs k and adj must be provided. See details below.
General parameters:
k- Number of nearest neighbors to connect each observation to (computesadjbelow).adj- An sparse, symmetric, adjacency matrix with booleans.trueif two observations are connected by a spring andfalseotherwise.kprojection- The number of nearest neighbors used when projecting onto the resulting force layout. (Not used in the computation of the layout, only during projection.)obs- Can be:copy(make a copy of sourceobs) or:keep(share the sourceobsobject).adj_out- OptionalRef. If specified, the (computed)adjmatrix will be assigned toadj_out.
Paramters controlling the physics simulation:
niter- Number of iterations to run the simulation.link_distance- The length of each spring.link_strength- The strength of the spring force.charge- The strength of the charge force.charge_min_distance- Used to avoid numerical instabilities by limiting the charge force for observations that are very close.theta- Parameter controlling accuracy in the Barnes-Hut approximation for charge forces.center_strength- Strength of the centering force.velocity_decay- At each iteration, the current velocity for an observations is multiplied byvelocity_decay.initialAlpha- The alpha value decreases over time and allows larger changes to happen early, while being more stable towards the end.finalAlpha- SeeinitialAlphainitialScale- The simulation is initialized by randomly drawing each observation from a multivariate Gaussian, and is scaled byinitialScale.seed- Optional random seed used to initrng. NB: This requires the packageStableRNGsto be loaded.rng- Optional RNG object. Useful for reproducibility.
Examples
julia> force_layout(data; ndim=3, k=100)SingleCellProjections.ftest! — Methodftest!(data::DataMatrix, h1; h0, kwargs...)Performs an F-Test with the given h1 (alternative hypothesis) and h0 (null hypothesis). Examples of F-Tests are ANOVA and Quadratic Regression, but any linear model can be used.
ftest! adds a F-statistic and a p-value column to data.var.
See ftest_table for usage examples and more details on computations and parameters.
In addition ftest! supports the kwarg:
prefix- Output column names for F-statistics and p-values will be prefixed with this string. If none is given, it will be constructed fromh1andh0.
See also: ftest_table, ftest, ttest!
SingleCellProjections.ftest — Methodftest(data::DataMatrix, h1; h0, var=:copy, obs=:copy, matrix=:keep, kwargs...)Performs an F-Test with the given h1 (alternative hypothesis) and h0 (null hypothesis). Examples of F-Tests are ANOVA and Quadratic Regression, but any linear model can be used.
ftest creates a copy of data and adds a F-statistic and a p-value column to data.var.
See ftest_table and ftest! for usage examples and more details on computations and parameters.
See also: ftest!, ftest_table, ttest
SingleCellProjections.ftest_table — Methodftest_table(data::DataMatrix, h1; h0, kwargs...)Performs an F-Test with the given h1 (alternative hypothesis) and h0 (null hypothesis). Examples of F-Tests are ANOVA and Quadratic Regression, but any linear model can be used. (See "Examples" below for concrete examples.)
F-tests can be performed on any DataMatrix, but it is almost always recommended to do it directly after transforming the data using e.g. sctransform, logtransform or tf_idf_transform.
Do not use ftest_table after normalizing the data using normalize_matrix: ftest_table needs to know about the h0 model (regressed out covariates) for correction computations. Failing to do so can result in incorrect results. If you want to correct for the same covariates, pass them as h0 to ftest_table.
h1 can be:
- A string specifying a column name of
data.obs. Auto-detection determines if the column is categorical (ANOVA) or numerical. - A
covariatefor more control of how to interpret the values in a column. - A tuple or vector of the above for compound models.
ftest_table returns a Dataframe with columns for variable IDs, F-statistics and p-values.
Supported kwargs are:
h0- Use a non-trivialh0(null) model. Specified in the same way ash1above.center=true- Add an intercept to theh0(null) model.statistic_col="F"- Name of the output column containing the F-statistics. (Set to nothing to remove from output.)pvalue_col="pValue"- Name of the output column containing the p-values. (Set to nothing to remove from output.)h1_missing=:skip- One of:skipand:error. Ifskip, missing values inh1columns are skipped, otherwise an error is thrown.h0_missing=:error- One of:skipand:error. Ifskip, missing values inh0columns are skipped, otherwise an error is thrown.allow_normalized_matrix=false- Set to true to accept running on aDataMatrixthat has been normalized.
Examples
Perform an ANOVA using the "celltype" annotation.
julia> ftest_table(transformed, "celltype")Perform an ANOVA using the "celltype" annotation, while correcting for fraction_mt (a linear covariate).
julia> ftest_table(transformed, "celltype"; h0="fraction_mt")Perform an ANOVA using the "celltype" annotation, while correcting for fraction_mt (a linear covariate) and "phase" (a categorical covariate).
julia> ftest_table(transformed, "celltype"; h0=("fraction_mt","phase"))Perform Quadractic Regression using the covariate x, by first creating an annotation for x squared, and then using a compound model.
julia> data.obs.x2 = data.obs.x.^2;
julia> ftest_table(transformed, ("x","x2"))See also: ftest!, ftest, ttest_table, covariate
SingleCellProjections.implicitsvd — Methodimplicitsvd(A; nsv=3, subspacedims=4nsv, niter=2, stabilize_sign=true, seed, rng)Compute the SVD of A using Random Subspace SVD. [Halko et al. "Finding structure with randomness: Probabilistic algorithms for constructing approximate matrix decompositions"]
nsv- Number of singular values/vectors to computesubspacedims- Number of dimensions used for the subspace approximating the action ofA.niter- Number of iterations. In each iteration, one multiplication ofAwith a matrix and one multiplication ofA'with a matrix will be performed.stabilize_sign- If true, handles the problem that the SVD is only unique up to the sign of each component (for real matrices), by ensuring that the l1 norm of the positive entires for each column in U is larger than the l1 norm of the negative entries.seed- Use a random seed to init therng. NB: This requires the packageStableRNGsto be loaded.rng- Specify a custom RNG.
SingleCellProjections.load10x — Methodload10x(filename; lazy=false, var_id=nothing, var_id_delim='_')Load a CellRanger ".h5" or ".mtx[.gz]" file as a DataMatrix.
lazy- Iftrue, the count matrix itself will not be loaded, only features and barcodes. This is used internally inload_countsto merge samples more efficiently. Useload_countsto later load the count data.var_id- If a pairvar_id_col=>cols, the contents of columnscolswill be merged to create new IDs. Useful to ensure that IDs are unique.var_id_delim- Delimiter used to when merging variable columns to create the variable id column.
Examples
Load counts from a CellRanger ".h5" file. (Recommended.)
julia> counts = load10x("filtered_feature_bc_matrix.h5")Load counts from a CellRanger ".mtx" file. Tries to find barcode and feature annotation files in the same folder.
julia> counts = load10x("matrix.mtx.gz")Lazy loading followed by loading.
julia> counts = load10x("filtered_feature_bc_matrix.h5");
julia> counts = load_counts(counts)See also: load_counts
SingleCellProjections.load_counts — Methodload_counts([loadfun=load10x], filenames;
sample_names,
sample_name_col,
obs_id_col = "cell_id",
lazy,
lazy_merge = false,
obs_id_delim = '_',
obs_id_prefixes = sample_names,
extra_var_id_cols::Union{Nothing,String,Vector{String}},
duplicate_var,
duplicate_obs,
callback=nothing)Load and merge multiple samples efficiently.
Defaults to loading 10x CellRanger files. The files are first loaded lazily, then the merged count matrix is allocated and finally each sample is loaded directly into the merged count matrix. (This strategy greatly reduces memory usage, since only one copy of data is needed instead of two.)
filenames specifies which files to load. (It can be a vector of filenames or a single filename string.) For each file, loadfun is called.
sample_names- Specify the sample names. Should be a vector of the same length asfilenames. Set tonothingto not create a sample name annotation.sample_name_col- Column for sample names inobs, defaults to "sampleName".obs_id_col- Colum for mergedids inobs.lazy- Enable lazy loading. Defaults to true ifload10xis used, andfalseotherwise.lazy_merge- Enable lazy merging, i.e.varandobsare created, but the count matrix merging is postponed until a second call toload_counts.obs_id_delim- Delimiter used when creating mergedobsIDs.obs_id_prefixes- Prefix (one per sample) used to create new IDs. Set to nothing to keep old IDs. Defaults tosample_names.extra_var_id_cols- Additional columns to use to ensure variable IDs are unique during merging. Defaults to "feature_type" if that column is present for all samples. Can be aVector{String}to include multiple columns. Set to nothing to disable.duplicate_var- Set to:ignore,:warnor:errorto decide what happens if duplicate var IDs are found.duplicate_obs- Set to:ignore,:warnor:errorto decide what happens if duplicate obs IDs are found.callback- Experimental callback functionality. The callback function is called between samples during merging. Returntrueto abort loading andfalseto continue.- Additional kwargs (including
duplicate_var/duplicate_obsif specified) are passed toloadfun.
Examples
Load and name samples:
julia> counts = load_counts(["s1.h5", "s2.h5"]; sample_names=["Sample A", "Sample B"])See also: load10x, merge_counts
SingleCellProjections.load_counts — Methodload_counts(data::DataMatrix{<:Lazy10xMatrix})Load counts for a lazily loaded 10x DataMatrix.
See also: load10x
SingleCellProjections.load_counts — Methodload_counts(data::DataMatrix{<:LazyMergedMatrix})Merge/load counts for a lazily merged DataMatrix.
SingleCellProjections.loadh5ad — Methodloadh5ad(filename; var_id_column=:id, obs_id_column=:id)Experimental loading of .h5ad files.
This function is deprecated. Load Muon.jl and see help for create_datamatrix.
SingleCellProjections.local_outlier_factor! — Methodlocal_outlier_factor!(data::DataMatrix, full::DataMatrix; k=10, col="LOF")Compute the Local Outlier Factor (LOF) for each observation in data, and add as column to data.obs with the name col.
When working with projected DataMatrices, use local_outlier_factor_projection! instead.
NB: This function might be very slow for high values of k.
First, the k nearest neighbors are found for each observation in data. Then, the Local Outlier Factor is computed by considering the distance between the neighbors, but this time in the full DataMatrix. Thus full must have the same observations as are present in data.
A LOF value smaller than or close to one is means that the observation is an inlier, but a LOF value much larger than one means that the observation is an inlier.
By specifiying full=data, this is coincides with the standard definition for Local Outlier Factor. However, it is perhaps more useful to find neighbors in a dimension reduced space (after e.g. svd (PCA) or umap), but then compute distances in the high dimensional space (typically after normalization). This way, an observation is concidered an outlier if the reduction to a lower dimensional space didn't properly represent the neighborhood of the observation.
The interface is not yet fully decided and subject to change.
Examples
Compute the Local Outlier Factor, with nearest neighbors based only on reduced, but later using distances in full for the actual LOF computation.
julia> reduced = svd(normalized; nsv=10)
julia> local_outlier_factor!(reduced, normalized; k=10)See also: local_outlier_factor, local_outlier_factor_table, local_outlier_factor_projection!
SingleCellProjections.local_outlier_factor — Methodlocal_outlier_factor(data::DataMatrix, full::DataMatrix; k=10, col="LOF", matrix=:keep, var=:copy)See local_outlier_factor! for documentation. This version does not modify data in place.
See also: local_outlier_factor!, local_outlier_factor_table, local_outlier_factor_projection
SingleCellProjections.local_outlier_factor_projection! — Methodlocal_outlier_factor_projection!(data::DataMatrix, full::DataMatrix, base::DataMatrix, base_full::DataMatrix; k=10, col="LOF_projection")Compute the Local Outlier Factor (LOF) for each observation in data, and add as column to data.obs with the name col.
Use local_outlier_factor_projection! if you are working with projected data, and local_outlier_factor! otherwise.
Parameters:
data- ADataMatrixfor which we compute LOF for each observation. Expected to be aDataMatrixprojected ontobase, so that thedataandbaseuse the same coordinate system.full- ADataMatrixwith the same observations asdata, used to compute distances in the LOF computation. Expected to be aDataMatrixprojected ontobase_full, so that thefullandbase_fulluse the same coordinate system.base- The baseDataMatrix.base_full- The baseDataMatrix.k- The number of nearest neighbors to use. NB: This function might be very slow for high values ofk.
First, for each observation in data, the k nearest neighbors in base are found. Then, the distance to each neighbor is computed using full and base_full. Thus full must have the same observations as are present in data, and base_full must have the same observations as base.
A LOF value smaller than or close to one is means that the observation is an inlier, but a LOF value much larger than one means that the observation is an inlier.
By specifiying full=data and base_full=base, this is coincides with the standard definition for Local Outlier Factor. However, it is perhaps more useful to find neighbors in a dimension reduced space (after e.g. svd (PCA) or umap), but then compute distances in the high dimensional space (typically after normalization). This way, an observation is concidered an outlier if the reduction to a lower dimensional space didn't properly represent the neighborhood of the observation.
The interface is not yet fully decided and subject to change.
Examples
Compute the Local Outlier Factor (LOF) for each observation in a data set reduced, which has been projected onto base_reduced.
The nearest neighbors are computed between observations in reduced and base_reduced, but the distances in the actual LOF computation are between the same observations in normalized and base_normalized.
julia> base_reduced = svd(base_normalized; nsv=10)
julia> normalized = project(counts, base_normalized);
julia> reduced = project(normalized, base_reduced);
julia> local_outlier_factor!(reduced, normalized, base_reduced, base_normalized; k=10)See also: local_outlier_factor_projection, local_outlier_factor_projection_table, local_outlier_factor!
SingleCellProjections.local_outlier_factor_projection — Methodlocal_outlier_factor_projection(data::DataMatrix, full::DataMatrix, base::DataMatrix, base_full::DataMatrix; k=10, col="LOF_projection", matrix=:keep, var=:copy)See local_outlier_factor_projection! for documentation. This version does not modify data in place.
See also: local_outlier_factor_projection!, local_outlier_factor_projection_table, local_outlier_factor
SingleCellProjections.local_outlier_factor_projection_table — Methodlocal_outlier_factor_projection_table(data::DataMatrix, full::DataMatrix, base::DataMatrix, base_full::DataMatrix; k=10, col="LOF_projection", matrix=:keep, var=:copy)See local_outlier_factor_projection! for documentation. This returns a DataFrame with observation IDs and a column col with LOF values for the projection.
See also: local_outlier_factor_projection!, local_outlier_factor_projection, local_outlier_factor_table
SingleCellProjections.local_outlier_factor_table — Methodlocal_outlier_factor_table(data::DataMatrix, full::DataMatrix; k=10, col="LOF")See local_outlier_factor! for documentation. This returns a DataFrame with observation IDs and a column col with LOF values.
See also: local_outlier_factor!, local_outlier_factor, local_outlier_factor_projection_table
SingleCellProjections.logtransform — Methodlogtransform([T=Float64], counts::DataMatrix;
var_filter = hasproperty(counts.var, "feature_type") ? "feature_type" => isequal("Gene Expression") : nothing,
var_filter_cols = hasproperty(counts.var, "feature_type") ? "feature_type" : nothing,
scale_factor=10_000,
var=:copy,
obs=:copy)Log₂-transform counts using the formula:
log₂(1 + cᵢⱼ*scale_factor/(∑ᵢcᵢⱼ))Optionally, T can be specified to control the eltype of the sparse transformed matrix. T=Float32 can be used to lower the memory usage, with little impact on the results, since downstream analysis is still done with Float64.
var_filter- Control which variables (features) to use for parameter estimation. Defaults to"feature_type" => isequal("Gene Expression"), if afeature_typecolumn is present incounts.var. Can be set tonothingto disable filtering. SeeDataFrames.filterfor how to specify filters.var_filter_cols- Additional columns used to ensure features are unique. Defaults to "feature_type" if present incounts.var. Use a Tuple/Vector for specifying multiple columns. Can be set tonothingto not include any additional columns.var- Can be:copy(make a copy of sourcevar) or:keep(share the sourcevarobject).obs- Can be:copy(make a copy of sourceobs) or:keep(share the sourceobsobject).
Examples
julia> transformed = logtransform(counts)Use eltype Float32 to lower memory usage:
julia> transformed = logtransform(Float32, counts)See also: sctransform
SingleCellProjections.mannwhitney! — Methodmannwhitney!(data::DataMatrix, column, [group_a, group_b]; kwargs...)Perform a Mann-Whitney U-test (also known as a Wilcoxon rank-sum test) between two groups of observations. The U statistic is corrected for ties, and p-values are computed using a normal approximation.
Note that data must be a DataMatrix containing a sparse matrix only. It is recommended to first logtransform (or tf_idf_transform) the raw counts before performing the Mann-Whitney U-test.
mannwhitney! adds a U statistic and a p-value column to data.var. See mannwhitney_table for more details on groups and kwargs.
In addition mannwhitney! supports the kwarg:
prefix- Output column names for U statistics and p-values will be prefixed with this string. If none is given, it will be constructed fromcolumn,group_aandgroup_b.
See also: mannwhitney_table, mannwhitney
SingleCellProjections.mannwhitney — Methodmannwhitney(data::DataMatrix, column, [group_a, group_b]; var=:copy, obs=:copy, matrix=:keep, kwargs...)Perform a Mann-Whitney U-test (also known as a Wilcoxon rank-sum test) between two groups of observations. The U statistic is corrected for ties, and p-values are computed using a normal approximation.
Note that data must be a DataMatrix containing a sparse matrix only. It is recommended to first logtransform (or tf_idf_transform) the raw counts before performing the Mann-Whitney U-test.
mannwhitney creates a copy of data and adds a U statistic and a p-value column to data.var. See mannwhitney! and mannwhitney_table for more details on groups and kwargs.
See also: mannwhitney!, mannwhitney_table
SingleCellProjections.mannwhitney_table — Methodmannwhitney_table(data::DataMatrix, column, [group_a, group_b]; kwargs...)Perform a Mann-Whitney U-test (also known as a Wilcoxon rank-sum test) between two groups of observations. The U statistic is corrected for ties, and p-values are computed using a normal approximation.
Note that data must be a DataMatrix containing a sparse matrix only. It is recommended to first logtransform (or tf_idf_transform) the raw counts before performing the Mann-Whitney U-test.
column specifies a column in data.obs and is used to determine which observations belong in which group.
If group_a and group_b are not given, the column must contain exactly two unique values (except missing). If group_a is given, but not group_b, the observations in group A are compared to all other observations (except missing). If both group_a and group_b are given, the observations in group A are compared the observations in group B.
mannwhitney_table returns a Dataframe with columns for variable IDs, U statistics and p-values.
Supported kwargs are:
statistic_col="U"- Name of the output column containing the U statistics. (Set to nothing to remove from output.)pvalue_col="pValue"- Name of the output column containing the p-values. (Set to nothing to remove from output.)h1_missing=:skip- One of:skipand:error. Ifskip, missing values incolumnare skipped, otherwise an error is thrown.
The following kwargs determine how the computations are threaded:
nworkers- Number of worker threads used in the computation. Set to 1 to disable threading.chunk_size- Number of variables processed in each chunk.channel_size- Max number of unprocessed chunks in queue.
See also: mannwhitney!, mannwhitney
SingleCellProjections.merge_counts — Methodmerge_counts(samples, sample_names;
lazy=false,
sample_name_col = sample_names===nothing ? nothing : "sampleName",
obs_id_col = "cell_id",
obs_id_delim = '_',
obs_id_prefixes = sample_names,
extra_var_id_cols::Union{Nothing,String,Vector{String}},
duplicate_var,
duplicate_obs,
callback=nothing)Merge samples to create one large DataMatrix, by concatenating the obs. The union of the variables from the samples is used, and if a variable is not present in a sample, the count will be set to zero.
The obs IDs are created by concatenating the current obs ID columns, together with the sample_names (if provided).
lazy- Lazy merging. Useload_countsto actually perform the merging.sample_name_col- Column in which thesample_namesare stored.obs_id_col- Name ofobsID column after merging. (Set to nothing to keep old column name.)obs_id_delim- Delimiter used when mergingobsIDs.obs_id_prefixes- Prefix (one per sample) used to create new IDs. Set to nothing to keep old IDs. Defaults tosample_names.extra_var_id_cols- Additional columns to use to ensure variable IDs are unique during merging. Defaults to "feature_type" if that column is present for all samples. Can be aVector{String}to include multiple columns. Set to nothing to disable.duplicate_var- Set to:ignore,:warnor:errorto decide what happens if duplicate var IDs are found.duplicate_obs- Set to:ignore,:warnor:errorto decide what happens if duplicate obs IDs are found.callback- Experimental callback functionality. The callback function is called between samples during merging. Returntrueto abort loading andfalseto continue.
See also: load_counts
SingleCellProjections.normalize_matrix — Methodnormalize_matrix(data::DataMatrix, design::DesignMatrix; scale=false, kwargs...)Normalize data using the specified design matrix.
See also: NormalizationModel, designmatrix
SingleCellProjections.normalize_matrix — Methodnormalize_matrix(data::DataMatrix, [covariates...]; center=true, scale=false, kwargs...)Normalize data. By default, the matrix is centered. Any covariates specified (using column names of data.obs) will be regressed out.
center- Set to true to center the data matrix.scale- Set to true to scale the variables in the data matrix to unit standard deviation.
For other kwargs and more detailed descriptions, see NormalizationModel and designmatrix.
Examples
Centering only:
julia> normalize_matrix(data)Regression model with intercept (centering) and "fraction_mt" (numerical annotation):
julia> normalize_matrix(data, "fraction_mt")As above, but also including "batch" (categorical annotation):
julia> normalize_matrix(data, "fraction_mt", "batch")See also: NormalizationModel, designmatrix
SingleCellProjections.obs_coordinates — Functionobs_coordinates(data::DataMatrix)Returns a matrix with coordinates for the observations. Not available for all types of DataMatrices. Mostly useful for data matrices after dimension reduction such as svd or force_layout has been applied.
In the case of SVD (PCA), obs_coordinates returns the principal components, scaled by the singular values. This is a a good starting point for downstream analysis, since it is the optimal linear approximation of the original data for the given number of dimensions.
SingleCellProjections.project — Methodproject(data::DataMatrix, models, args...; verbose=true, kwargs...)Convenience function for projection onto multiple models. Essentially calls foldl and prints some @info messages (if verbose=true). In most cases, it is better to call project(data, base::DataMatrix) instead of using this method directly.
SingleCellProjections.project — Methodproject(data::DataMatrix, base::DataMatrix, args...; from=nothing, kwargs...)Project data onto base, by applying ProjectionModels from base one by one.
Since data already might have some models applied, project will try to figure out which models from base to use. See "Examples" below for concrete examples. Here's a more technical overview:
Consider a base data matrix with four models:
base: A -> B -> C -> DGiven some new data (typically counts), we can project that onto base, given the result proj by applying all four models:
data:
proj: A -> B -> C -> DIf data already has some models applied (e.g. we already projected onto A and B above), project will look for the last model in data (in this case B) in the list of models in base, and only apply models after that (in this case C and D).
data: A -> B
proj: A -> B -> C -> DIt is also possible to use the from kwarg to specify exactly which models to apply. (The models in from must be a prefix of the models in base, or in other words, base was created by applying additional operations to from.)
data: X
base: A -> B -> C -> D
from: A -> B
proj: X -> C -> DNote that it is necessary to use the from kwarg if the last model in data does not occurr in base, because project cannot figure out on its own which models it makes sense to apply.
Examples
First, we construct a "base" by loading counts, SCTransforming, normalizing, computing the svd and finally computing a force layout:
julia> fp = ["GSE164378_RNA_ADT_3P_P1.h5", "GSE164378_RNA_ADT_3P_P2.h5"];
julia> counts = load_counts(fp; sample_names=["P1","P2"]);
julia> transformed = sctransform(counts);
julia> normalized = normalize_matrix(transformed);
julia> reduced = svd(normalized; nsv=10);
julia> fl = force_layout(reduced; ndim=3, k=100)
DataMatrix (3 variables and 35340 observations)
Matrix{Float64}
Variables: id
Observations: id, sampleName, barcode
Models: NearestNeighborModel(base="force_layout", k=10), SVD, Normalization, SCTransformNote how the last line lists all ProjectionModels used in the creation of fl.
Next, let's load some more samples for projection:
julia> fp2 = ["GSE164378_RNA_ADT_3P_P5.h5", "GSE164378_RNA_ADT_3P_P6.h5"];
julia> counts2 = load_counts(fp2; sample_names=["P5","P6"]);It is easy to project the newly loaded counts2 onto the "base" force layout fl:
julia> project(counts2, fl)
DataMatrix (3 variables and 42553 observations)
Matrix{Float64}
Variables: id
Observations: id, sampleName, barcode
Models: NearestNeighborModel(base="force_layout", k=10), SVD, Normalization, SCTransformWe can also project in two or more steps, to get access to intermediate results:
julia> reduced2 = project(counts2, reduced)
DataMatrix (20239 variables and 42553 observations)
SVD (10 dimensions)
Variables: id, feature_type, name, genome, read, pattern, sequence, logGeneMean, outlier, beta0, ...
Observations: id, sampleName, barcode
Models: SVDModel(nsv=10), Normalization, SCTransform
julia> project(reduced2, fl)
DataMatrix (3 variables and 42553 observations)
Matrix{Float64}
Variables: id
Observations: id, sampleName, barcode
Models: NearestNeighborModel(base="force_layout", k=10), SVD, Normalization, SCTransformIf the DataMatrix we want to project is modified, we need to use the from kwarg to tell project which models to use:
julia> filtered = counts2[:,1:10:end]
DataMatrix (33766 variables and 4256 observations)
SparseArrays.SparseMatrixCSC{Int64, Int32}
Variables: id, feature_type, name, genome, read, pattern, sequence
Observations: id, sampleName, barcode
Models: FilterModel(:, 1:10:42551)
julia> reduced2b = project(filtered2, reduced; from=counts)
DataMatrix (20239 variables and 4256 observations)
SVD (10 dimensions)
Variables: id, feature_type, name, genome, read, pattern, sequence, logGeneMean, outlier, beta0, ...
Observations: id, sampleName, barcode
Models: SVDModel(nsv=10), Normalization, SCTransform, FilterAfter that, it is possible to continue without specifying from:
julia> project(reduced2b, fl)
DataMatrix (3 variables and 4256 observations)
Matrix{Float64}
Variables: id
Observations: id, sampleName, barcode
Models: NearestNeighborModel(base="force_layout", k=10), SVD, Normalization, SCTransform, FilterSingleCellProjections.project — Methodproject(data::DataMatrix, model::ProjectionModel, args...; verbose=true, kwargs...)Core projection function. Project data based on the single ProjectionModel model. In most cases, it is better to call project(data, base::DataMatrix) instead of using this method directly.
SingleCellProjections.pseudobulk — Methodpseudobulk(data::DataMatrix, obs_col, [additional_columns...]; var=:copy)Create a new DataMatrix by averging over groups, as specified by the categorical annotation obs_col (and optionally additional columns).
var- Can be:copy(make a copy of sourcevar) or:keep(share the sourcevarobject).
Examples
Create a pseudobulk representation of each sample:
julia> pseudobulk(transformed, "sampleName")Create a pseudobulk representation for each celltype in each sample:
julia> pseudobulk(transformed, "sampleName", "celltype")SingleCellProjections.set_obs_id_col! — Methodset_obs_id_col!(data::DataMatrix, obs_id_col::String; duplicate_obs=:error)Set which column to use as observation IDs. It will be moved to the first column of data.obs. The rows of this column in data.obs must be unique.
duplicate_obs- Set to :ignore, :warn or :error to decide what happens if duplicate IDs are found.
SingleCellProjections.set_var_id_col! — Methodset_var_id_col!(data::DataMatrix, var_id_col::String; duplicate_var=:error)Set which column to use as variable IDs. It will be moved to the first column of data.var. The rows of this column in data.var must be unique.
duplicate_var- Set to :ignore, :warn or :error to decide what happens if duplicate IDs are found.
SingleCellProjections.splitrange — Methodsplitrange(r::UnitRange, nparts::Integer)Splits a range in nparts number of parts of equal length.
SingleCellProjections.tf_idf_transform — Methodtf_idf_transform([T=Float64], counts::DataMatrix;
var_filter = hasproperty(counts.var, "feature_type") ? "feature_type" => isequal("Gene Expression") : nothing,
var_filter_cols = hasproperty(counts.var, "feature_type") ? "feature_type" : nothing,
scale_factor = 10_000,
idf = vec(size(counts,2) ./ max.(1,sum(counts.matrix; dims=2))),
annotate = true,
var = :copy,
obs = :copy)Compute the TF-IDF (term frequency-inverse document frequency) transform of counts, using the formula log( 1 + scale_factor * tf * idf ) where tf is the term frequency counts.matrix ./ max.(1, sum(counts.matrix; dims=1)).
Optionally, T can be specified to control the eltype of the sparse transformed matrix. T=Float32 can be used to lower the memory usage, with little impact on the results, since downstream analysis is still done with Float64.
var_filter- Control which variables (features) to use for parameter estimation. Defaults to"feature_type" => isequal("Gene Expression"), if afeature_typecolumn is present incounts.var. Can be set tonothingto disable filtering. SeeDataFrames.filterfor how to specify filters.var_filter_cols- Additional columns used to ensure features are unique. Defaults to "feature_type" if present incounts.var. Use a Tuple/Vector for specifying multiple columns. Can be set tonothingto not include any additional columns.annotate- If true,idfwill be added as avarannotation.var- Can be:copy(make a copy of sourcevar) or:keep(share the sourcevarobject).obs- Can be:copy(make a copy of sourceobs) or:keep(share the sourceobsobject).
SingleCellProjections.ttest! — Methodttest!(data::DataMatrix, h1, [group_a], [group_b]; h0, kwargs...)Performs a t-Test with the given h1 (alternative hypothesis) and h0 (null hypothesis). Examples of t-Tests are Two-Group tests and Linear Regression.
ttest! adds a t-statistic, a p-value and a difference column to data.var.
See ttest_table for usage examples and more details on computations and parameters.
In addition ttest! supports the kwarg:
prefix- Output column names for t-statistics, p-values and differences will be prefixed with this string. If none is given, it will be constructed fromh1,group_a,group_bandh0.
See also: ttest_table, ttest, ftest!, mannwhitney!
SingleCellProjections.ttest — Methodttest(data::DataMatrix, h1, [group_a], [group_b]; h0, var=:copy, obs=:copy, matrix=:keep, kwargs...)Performs a t-Test with the given h1 (alternative hypothesis) and h0 (null hypothesis). Examples of t-Tests are Two-Group tests and Linear Regression.
ttest creates a copy of data and adds a t-statistic, a p-value and a difference column to data.var.
See ttest_table and ttest! for usage examples and more details on computations and parameters.
See also: ttest!, ttest_table, ftest, mannwhitney
SingleCellProjections.ttest_table — Methodttest_table(data::DataMatrix, h1, [group_a], [group_b]; h0, kwargs...)Performs a t-Test with the given h1 (alternative hypothesis) and h0 (null hypothesis). Examples of t-Tests are Two-Group tests and Linear Regression.
T-tests can be performed on any DataMatrix, but it is almost always recommended to do it directly after transforming the data using e.g. sctransform, logtransform or tf_idf_transform.
Do not use ttest_table after normalizing the data using normalize_matrix: ttest_table needs to know about the h0 model (regressed out covariates) for correction computations. Failing to do so can result in incorrect results. If you want to correct for the same covariates, pass them as h0 to ttest_table.
h1 can be:
- A string specifying a column name of
data.obs. Auto-detection determines if the column is categorical (Two-Group) or numerical (linear regression).- If
group_aandgroup_bare specified, a Two-Group test betweengroup_aandgroup_bis performed. - If
group_ais specified, but notgroup_b, a Two-Group test betweengroup_aand all other observations is performed.
- If
- A
covariatefor more control of how to interpret the values in the column.
ttest_table returns a Dataframe with columns for variable IDs, t-statistics, p-values and differences. For Two-group tests, difference is the difference in mean between the two groups. For linear regression, the difference corresponds to the rate of change.
Supported kwargs are:
h0- Use a non-trivialh0(null) model. Specified in the same way ash1above.center=true- Add an intercept to theh0(null) model.statistic_col="t"- Name of the output column containing the t-statistics. (Set to nothing to remove from output.)pvalue_col="pValue"- Name of the output column containing the p-values. (Set to nothing to remove from output.)difference_col="difference"- Name of the output column containing the differences. (Set to nothing to remove from output.)h1_missing=:skip- One of:skipand:error. Ifskip, missing values inh1columns are skipped, otherwise an error is thrown.h0_missing=:error- One of:skipand:error. Ifskip, missing values inh0columns are skipped, otherwise an error is thrown.allow_normalized_matrix=false- Set to true to accept running on aDataMatrixthat has been normalized.
Examples
Perform a Two-Group t-test between celltypes "Mono" and "DC".
julia> ttest_table(transformed, "celltype", "Mono", "DC")Perform a Two-Group t-test between celltype "Mono" and all other cells.
julia> ttest_table(transformed, "celltype", "Mono")Perform a Two-Group t-test between celltypes "Mono" and "DC", while correcting for "fraction_mt" (a linear covariate).
julia> ttest_table(transformed, "celltype", "Mono", "DC")Perform Linear Regression using the covariate "fraction_mt".
julia> ttest_table(transformed, "fraction_mt")See also: ttest!, ttest, ftest_table, mannwhitney_table, covariate
SingleCellProjections.update_matrix — Functionupdate_matrix(data::DataMatrix, matrix, model=nothing;
var::Union{Symbol,String,DataFrame} = "",
obs::Union{Symbol,String,DataFrame} = "")Create a new DataMatrix by replacing parts of data with new values. Mostly useful when implementing new ProjectionModels.
matrix- the new matrix.model- will be appended to the list of models fromdata. If set tonothing, the resulting list ofmodelswill be empty.
Kwargs:
var- One of::copy- Copy fromdata.:keep- Sharevarwithdata.::DataFrame- Replace with a new table with variable annotations.prefix::String- Prefix, the new variables will be named prefix1, prefix2, etc.
obsSeevar.
SingleCellProjections.ustatistic_single — Methodustatistic_single(X, j, groups, n1, n2)NB: Assumes all sparse non-zeros are positive.
X is a sparse matrix where each column is a variable. j is the current variable. groups is a vector with values: 1 for each sample in group 1, 2 for each sample in group 2 and 0 for samples in neither group. n1 number of elements in group 1 (precomputed from groups) n2 number of elements in group 2 (precomputed from groups)
SingleCellProjections.var_coordinates — Functionvar_coordinates(data::DataMatrix)Returns a matrix with coordinates for the variables. Only available for DataMatrices that have a dual representation (e.g. SVD/PCA).
In the case of SVD (PCA), var_coordinates returns the principal components as unit vectors.
SingleCellProjections.var_counts_fraction! — Methodvar_counts_fraction!(counts::DataMatrix, sub_filter, tot_filter, col; check=true, var=:keep, obs=:keep)For each observation, compute the fraction of counts that match a specific variable pattern.
sub_filterdecides which variables are counted.tot_filterdecides which variables to include in the total.
kwargs:
var- Use this to setvarin theProjectionModel.obs- Use this to setobsin theProjectionModel. Note thatcounts.obsis changed in place, regardless of the value ofobs.
If check=true, an error will be thrown if no variables match the patterns.
For more information on filtering syntax, see examples below and the documentation on DataFrames.filter.
Examples
Compute the fraction of reads in MT- genes, considering only "Gene Expression" features (and not e.g. "Antibody Capture").
var_counts_fraction!(counts, "name"=>startswith("MT-"), "feature_type"=>isequal("Gene Expression"), "fraction_mt")Compute the fraction of reads in MT- genes, when there is no feature_type annotation (i.e. all variables are genes).
var_counts_fraction!(counts, "name"=>startswith("MT-"), Returns(true), "fraction_mt")See also: var_counts_fraction
SingleCellProjections.var_counts_fraction — Methodvar_counts_fraction(counts::DataMatrix, sub_filter, tot_filter, col; check=true, var=:copy, obs=:copy)For each observation, compute the fraction of counts that match a specific variable pattern.
sub_filterdecides which variables are counted.tot_filterdecides which variables to include in the total.
kwargs:
var- Can be:copy(make a copy of sourcevar) or:keep(share the sourcevarobject).obs- Can be:copy(make a copy of sourceobs) or:keep(share the sourceobsobject).
If check=true, an error will be thrown if no variables match the patterns.
For more information on filtering syntax, see examples below and the documentation on DataFrames.filter.
Examples
Compute the fraction of reads in MT- genes, considering only "Gene Expression" features (and not e.g. "Antibody Capture").
var_counts_fraction(counts, "name"=>startswith("MT-"), "feature_type"=>isequal("Gene Expression"), "fraction_mt")Compute the fraction of reads in MT- genes, when there is no feature_type annotation (i.e. all variables are genes).
var_counts_fraction(counts, "name"=>startswith("MT-"), Returns(true), "fraction_mt")See also: var_counts_fraction!
SingleCellProjections.var_counts_sum! — Methodvar_counts_sum!([f=identity], counts::DataMatrix, filter, col; check=true, var=:keep, obs=:keep)For each observation, compute the sum of counts matching the filter.
If f is specified, it is applied to each element before summing. (Similar to sum.)
kwargs:
var- Use this to setvarin theProjectionModel.obs- Use this to setobsin theProjectionModel. Note thatcounts.obsis changed in place, regardless of the value ofobs.
If check=true, an error will be thrown if no variables match the pattern.
For more information on filtering syntax, see examples below and the documentation on DataFrames.filter.
Examples
Sum all "Gene Expression" counts:
var_counts_sum!(counts, "feature_type"=>isequal("Gene Expression"), "totalRNACount")Compute the number of "Gene Expression" variables that are expressed (i.e. nonzero):
var_counts_sum!(!iszero, counts, "feature_type"=>isequal("Gene Expression"), "nonzeroRNACount")See also: var_counts_sum
SingleCellProjections.var_counts_sum — Methodvar_counts_sum([f=identity], counts::DataMatrix, filter, col; check=true, var=:keep, obs=:keep)For each observation, compute the sum of counts matching the filter.
If f is specified, it is applied to each element before summing. (Similar to sum.)
kwargs:
var- Use this to setvarin theProjectionModel.obs- Use this to setobsin theProjectionModel. Note thatcounts.obsis changed in place, regardless of the value ofobs.
If check=true, an error will be thrown if no variables match the pattern.
For more information on filtering syntax, see examples below and the documentation on DataFrames.filter.
Examples
Sum all "Gene Expression" counts:
var_counts_sum(counts, "feature_type"=>isequal("Gene Expression"), "totalRNACount")Compute the number of "Gene Expression" variables that are expressed (i.e. nonzero):
var_counts_sum(!iszero, counts, "feature_type"=>isequal("Gene Expression"), "nonzeroRNACount")See also: var_counts_sum!
SingleCellProjections.variable_std — Methodvariable_std(data::DataMatrix)Computes the variance of each variable in data.
data must be mean-centered. E.g. by using normalize_matrix before calling variable_std.
See also: variable_var, normalize_matrix
SingleCellProjections.variable_var — Methodvariable_var(data::DataMatrix)Computes the variance of each variable in data.
data must be mean-centered. E.g. by using normalize_matrix before calling variable_var.
See also: variable_std, normalize_matrix