Generate subcluster labels within broad cell groups — subcluster

Runs quickSubCluster to subcluster cells within an existing broad grouping, returning a single vector of subcluster labels.

Usage

subcluster_labels(
  sce,
  group_col = "broad_cluster",
  feature_mode = c("hvg", "all"),
  out_col = NULL,
  subcluster_col_fmt = "%s_sc%s",
  hvg_prop = 0.1,
  min_ncells = 50,
  seed = 1234,
  BPPARAM = BiocParallel::SerialParam()
)

Arguments

sce

A SingleCellExperiment containing a logcounts assay.

group_col

Character scalar. Name of the column in colData(sce) containing the broad group labels to be subclustered. Defaults to "broad_cluster".

feature_mode

Character scalar specifying the feature space used for PCA within each broad subset. One of:

"hvg": Recompute gene variance within each subset using modelGeneVar and run PCA on the top highly variable genes defined by hvg_prop.
"all": Run PCA on all genes present in the subset without recomputing highly variable genes.

out_col

Character scalar or NULL. If NULL (default), the function returns only the subcluster label vector. If supplied, the labels are written to colData(sce)[[out_col]] as a factor and the modified sce is returned.

subcluster_col_fmt

Character scalar. Format string passed to quickSubCluster via format. This controls how final subcluster labels are constructed. Defaults to `%s_sc%s`.

hvg_prop

Numeric scalar in (0, 1]. Proportion of genes to retain as highly variable within each subset when feature_mode = "hvg". Defaults to 0.1.

min_ncells

Integer scalar. Minimum number of cells required for a broad group to be subclustered. Groups smaller than this threshold are not subclustered by quickSubCluster. Defaults to 50.

seed

Integer scalar. Random seed used before running subclustering. Defaults to 1234.

BPPARAM

A BiocParallelParam object used for parallelisation. Defaults to SerialParam().

Value

If out_col = NULL, returns a character vector of subcluster labels, one per cell in sce, as returned by quickSubCluster with simplify = TRUE.

If out_col is supplied, returns the input SingleCellExperiment with the subcluster labels added to colData(sce)[[out_col]] as a factor.

Details

For each broad group in group_col, the function subsets the cells, performs PCA on either:

recomputed highly variable genes (feature_mode = "hvg"), or
all genes present in the subset (feature_mode = "all")

and then builds an SNN graph followed by Leiden clustering.

PCA dimensionality and graph-clustering parameters are estimated separately for each subset using estimate_cluster_params, allowing the workflow to adapt to the number of cells in each broad group.

This function assumes that sce has already been normalised and contains a logcounts assay. No additional normalisation is performed inside quickSubCluster.

Within each broad subset, PCA is performed using fixedPCA and clustering is performed using clusterCells with a SNNGraphParam configured for Leiden clustering.

The PCA rank and clustering parameters are estimated independently for each subset using estimate_cluster_params.

The output is simplified, so only the final subcluster labels are returned, as generated by quickSubCluster using the supplied subcluster_col_fmt.

Examples

if (FALSE) { # \dontrun{
subclusters <- subcluster_labels(
  sce = sce,
  group_col = "broad_cluster",
  feature_mode = "hvg"
)

sce <- subcluster_labels(
  sce = sce,
  group_col = "broad_cluster",
  feature_mode = "all",
  out_col = "broad_subcluster"
)
} # }