Skip to contents

Assigns broad cluster labels using ranked marker tables produced by rank_cluster_markers.

Usage

label_broad_clusters(
  sce,
  broad_config = NULL,
  marker_config_key = "marker_config",
  ranked_markers = NULL,
  ranked_markers_key = "broad_cluster_markers",
  cluster_col = "cluster_broad_hvg",
  label_col = "broad_cluster",
  fdr_threshold = 0.05,
  effect_threshold = 0.6,
  unassigned_label = "other"
)

Arguments

sce

A SingleCellExperiment.

broad_config

Named list or NULL. Validated broad marker definitions as produced by build_broad_marker_config. If NULL, extracted from metadata(sce)[[marker_config_key]][["broad"]].

marker_config_key

Character scalar. Metadata key containing the marker configuration. Only used when broad_config = NULL. Defaults to "marker_config".

ranked_markers

List or NULL. Ranked marker result from rank_cluster_markers(return_list = TRUE). If NULL, extracted from metadata(sce)[[ranked_markers_key]].

ranked_markers_key

Character scalar. Metadata key containing ranked marker tables. Only used when ranked_markers = NULL. Defaults to "broad_cluster_markers".

cluster_col

Character scalar. colData column containing cluster identifiers. Defaults to "cluster_broad_hvg".

label_col

Character scalar. Name of the output colData column. Defaults to "broad_cluster".

fdr_threshold

Numeric scalar. Maximum FDR for a marker to be considered significant. Defaults to 0.05.

effect_threshold

Numeric scalar. Minimum effect size (AUC for Wilcoxon, logFC for t-test). Defaults to 0.6.

unassigned_label

Character scalar. Label assigned to clusters with no passing markers. Defaults to "other".

Value

The input sce with a new colData column label_col containing assigned broad labels.

Details

Ranked marker tables may be supplied either:

  • directly via the ranked_markers argument, or

  • indirectly via metadata(sce)[[ranked_markers_key]].

Broad marker definitions may be supplied either:

  • directly via the broad_config argument, or

  • indirectly via metadata(sce)[[marker_config_key]][["broad"]].

For each cluster, genes are ordered by ascending FDR then descending effect size (summary.AUC for Wilcoxon, summary.logFC for t-tests). Within each broad category, only markers passing both FDR <= fdr_threshold and effect size >= effect_threshold are retained. The category score is the median rank of passing markers; the best category is chosen by lowest median rank with user-supplied priority as a tie-breaker.

Collapsed broad label edge case

If all clusters receive the same broad label but more than one unsupervised cluster exists, the original cluster labels are retained in label_col rather than collapsing to a single label. This preserves cluster structure for downstream subclustering.