Rank cluster markers and assign broad cluster labels — annotate_broad

Convenience wrapper around rank_cluster_markers and label_broad_clusters that computes ranked marker tables for an existing clustering and then assigns broad labels using validated broad marker definitions stored in metadata(sce) or supplied directly.

Usage

annotate_broad_clusters(
  sce,
  broad_config = NULL,
  ranked_markers = NULL,
  cluster_col = "cluster_broad_hvg",
  marker_config_key = "marker_config",
  ranked_markers_key = "broad_cluster_markers",
  label_col = "broad_cluster",
  assay_type = "logcounts",
  test_type = c("wilcox", "t"),
  direction = c("up", "down", "any"),
  pval_type = c("any", "some", "all"),
  min_prop = 0.25,
  fdr_threshold = 0.05,
  effect_threshold = 0.6,
  unassigned_label = "other",
  BPPARAM = BiocParallel::SerialParam()
)

Arguments

sce: A SingleCellExperiment.
broad_config: Named list or NULL. Validated broad marker definitions as produced by build_broad_marker_config. If NULL, extracted from metadata(sce)[[marker_config_key]][["broad"]].
ranked_markers: List or NULL. Ranked marker result from rank_cluster_markers(return_list = TRUE). If NULL, marker ranking is performed internally and stored under ranked_markers_key.
cluster_col: Character scalar. Column in colData(sce) containing the cluster labels to annotate. Defaults to "cluster_broad_hvg".
marker_config_key: Character scalar. Metadata entry containing the validated marker configuration list with a $broad element. Only used when broad_config = NULL. Defaults to "marker_config".
ranked_markers_key: Character scalar. Metadata entry used to store the ranked marker tables generated by rank_cluster_markers. Defaults to "broad_cluster_markers".
label_col: Character scalar. Name of the output broad label column in colData(sce). Defaults to "broad_cluster".
assay_type: Character scalar. Assay to use for rank_cluster_markers. Defaults to "logcounts".
test_type: Character scalar. Differential expression test passed to rank_cluster_markers. One of "wilcox" or "t". Defaults to "wilcox".
direction: Character scalar. Direction of testing passed to rank_cluster_markers. One of "up", "down", or "any". Defaults to "up".
pval_type: Character scalar. P-value combination mode passed to rank_cluster_markers. One of "any", "some", or "all". Defaults to "any".
min_prop: Numeric scalar in (0, 1]. Passed to rank_cluster_markers. Defaults to 0.25.
fdr_threshold: Numeric scalar. Maximum FDR to consider a gene significant in label_broad_clusters. Defaults to 0.05.
effect_threshold: Numeric scalar. Minimum effect size required for a marker to contribute to broad label assignment. Interpreted as AUC for Wilcoxon tests and log-fold change for t-tests. Defaults to 0.6.
unassigned_label: Character scalar. Label used when no broad category passes the assignment criteria. Defaults to "other".
BPPARAM: A BiocParallelParam object. Defaults to SerialParam().

Value

The input SingleCellExperiment with:

metadata(sce)[[ranked_markers_key]]: Ranked marker tables produced by rank_cluster_markers.
colData(sce)[[label_col]]: Assigned broad labels. In the special case where all clusters collapse to a single broad label while cluster_col still contains multiple original clusters, this column will instead contain the original cluster labels.

Details

This function is intended for workflows where:

an unsupervised clustering has already been generated,
a validated marker configuration has already been attached to the object (or is supplied directly via broad_config), and
broad labels are assigned by testing whether curated broad markers are significantly and strongly ranked within each cluster.

Marker ranking is performed with findMarkers via rank_cluster_markers, and broad labels are then assigned by label_broad_clusters using the median rank of markers that pass the chosen FDR and effect-size thresholds.

Edge case

collapsed broad labels: In some datasets, all original unsupervised clusters may be assigned the same broad label (for example, all clusters are labelled "other" or all are labelled "immune"). In that situation, downstream subclustering should still be able to operate on the original unsupervised clusters rather than on a single collapsed broad label.

To support this, label_broad_clusters checks whether broad annotation has collapsed all clusters to one unique label while the original clustering in cluster_col still contains more than one cluster. If this occurs, the original cluster labels are retained in label_col instead of the collapsed broad annotation. This preserves the original cluster structure for downstream subclustering.

Examples

if (FALSE) { # \dontrun{
sce <- annotate_broad_clusters(
  sce        = sce,
  cluster_col = "cluster_broad_hvg",
  label_col   = "broad_cluster"
)

# Supplying broad_config directly
sce <- annotate_broad_clusters(
  sce         = sce,
  broad_config = my_broad_config,
  cluster_col  = "cluster_broad_hvg",
  label_col    = "broad_cluster"
)
} # }