
Generate subcluster labels within broad cell groups
Source:R/method-L3_subcluster.R
subcluster_labels.RdRuns quickSubCluster to subcluster cells within an
existing broad grouping, returning a single vector of subcluster labels.
Usage
subcluster_labels(
sce,
group_col = "broad_cluster",
feature_mode = c("hvg", "all"),
out_col = NULL,
subcluster_col_fmt = "%s_sc%s",
hvg_prop = 0.1,
min_ncells = 50,
seed = 1234,
BPPARAM = BiocParallel::SerialParam()
)Arguments
- sce
A
SingleCellExperimentcontaining alogcountsassay.- group_col
Character scalar. Name of the column in
colData(sce)containing the broad group labels to be subclustered. Defaults to"broad_cluster".- feature_mode
Character scalar specifying the feature space used for PCA within each broad subset. One of:
"hvg"Recompute gene variance within each subset using
modelGeneVarand run PCA on the top highly variable genes defined byhvg_prop."all"Run PCA on all genes present in the subset without recomputing highly variable genes.
- out_col
Character scalar or
NULL. IfNULL(default), the function returns only the subcluster label vector. If supplied, the labels are written tocolData(sce)[[out_col]]as a factor and the modifiedsceis returned.- subcluster_col_fmt
Character scalar. Format string passed to
quickSubClusterviaformat. This controls how final subcluster labels are constructed. Defaults to`%s_sc%s`.- hvg_prop
Numeric scalar in
(0, 1]. Proportion of genes to retain as highly variable within each subset whenfeature_mode = "hvg". Defaults to0.1.- min_ncells
Integer scalar. Minimum number of cells required for a broad group to be subclustered. Groups smaller than this threshold are not subclustered by
quickSubCluster. Defaults to50.- seed
Integer scalar. Random seed used before running subclustering. Defaults to
1234.- BPPARAM
A
BiocParallelParamobject used for parallelisation. Defaults toSerialParam().
Value
If out_col = NULL, returns a character vector of subcluster labels,
one per cell in sce, as returned by
quickSubCluster with simplify = TRUE.
If out_col is supplied, returns the input
SingleCellExperiment with the subcluster
labels added to colData(sce)[[out_col]] as a factor.
Details
For each broad group in group_col, the function subsets the cells,
performs PCA on either:
recomputed highly variable genes (
feature_mode = "hvg"), orall genes present in the subset (
feature_mode = "all")
and then builds an SNN graph followed by Leiden clustering.
PCA dimensionality and graph-clustering parameters are estimated separately
for each subset using estimate_cluster_params, allowing the
workflow to adapt to the number of cells in each broad group.
This function assumes that sce has already been normalised and
contains a logcounts assay. No additional normalisation is performed
inside quickSubCluster.
Within each broad subset, PCA is performed using
fixedPCA and clustering is performed using
clusterCells with a
SNNGraphParam configured for Leiden clustering.
The PCA rank and clustering parameters are estimated independently for each
subset using estimate_cluster_params.
The output is simplified, so only the final subcluster labels are returned,
as generated by quickSubCluster using the supplied
subcluster_col_fmt.