
Rank cluster markers and assign broad cluster labels
Source:R/method-L2_cluster_enrichment.R
annotate_broad_clusters.RdConvenience wrapper around rank_cluster_markers and
label_broad_clusters that computes ranked marker tables for an
existing clustering and then assigns broad labels using validated broad
marker definitions stored in metadata(sce) or supplied directly.
Usage
annotate_broad_clusters(
sce,
broad_config = NULL,
ranked_markers = NULL,
cluster_col = "cluster_broad_hvg",
marker_config_key = "marker_config",
ranked_markers_key = "broad_cluster_markers",
label_col = "broad_cluster",
assay_type = "logcounts",
test_type = c("wilcox", "t"),
direction = c("up", "down", "any"),
pval_type = c("any", "some", "all"),
min_prop = 0.25,
fdr_threshold = 0.05,
effect_threshold = 0.6,
unassigned_label = "other",
BPPARAM = BiocParallel::SerialParam()
)Arguments
- sce
- broad_config
Named list or
NULL. Validated broad marker definitions as produced bybuild_broad_marker_config. IfNULL, extracted frommetadata(sce)[[marker_config_key]][["broad"]].- ranked_markers
List or
NULL. Ranked marker result fromrank_cluster_markers(return_list = TRUE). IfNULL, marker ranking is performed internally and stored underranked_markers_key.- cluster_col
Character scalar. Column in
colData(sce)containing the cluster labels to annotate. Defaults to"cluster_broad_hvg".- marker_config_key
Character scalar. Metadata entry containing the validated marker configuration list with a
$broadelement. Only used whenbroad_config = NULL. Defaults to"marker_config".- ranked_markers_key
Character scalar. Metadata entry used to store the ranked marker tables generated by
rank_cluster_markers. Defaults to"broad_cluster_markers".- label_col
Character scalar. Name of the output broad label column in
colData(sce). Defaults to"broad_cluster".- assay_type
Character scalar. Assay to use for
rank_cluster_markers. Defaults to"logcounts".- test_type
Character scalar. Differential expression test passed to
rank_cluster_markers. One of"wilcox"or"t". Defaults to"wilcox".- direction
Character scalar. Direction of testing passed to
rank_cluster_markers. One of"up","down", or"any". Defaults to"up".- pval_type
Character scalar. P-value combination mode passed to
rank_cluster_markers. One of"any","some", or"all". Defaults to"any".- min_prop
Numeric scalar in
(0, 1]. Passed torank_cluster_markers. Defaults to0.25.- fdr_threshold
Numeric scalar. Maximum FDR to consider a gene significant in
label_broad_clusters. Defaults to0.05.- effect_threshold
Numeric scalar. Minimum effect size required for a marker to contribute to broad label assignment. Interpreted as AUC for Wilcoxon tests and log-fold change for t-tests. Defaults to
0.6.- unassigned_label
Character scalar. Label used when no broad category passes the assignment criteria. Defaults to
"other".- BPPARAM
A
BiocParallelParamobject. Defaults toSerialParam().
Value
The input SingleCellExperiment with:
metadata(sce)[[ranked_markers_key]]Ranked marker tables produced by
rank_cluster_markers.colData(sce)[[label_col]]Assigned broad labels. In the special case where all clusters collapse to a single broad label while
cluster_colstill contains multiple original clusters, this column will instead contain the original cluster labels.
Details
This function is intended for workflows where:
an unsupervised clustering has already been generated,
a validated marker configuration has already been attached to the object (or is supplied directly via
broad_config), andbroad labels are assigned by testing whether curated broad markers are significantly and strongly ranked within each cluster.
Marker ranking is performed with findMarkers via
rank_cluster_markers, and broad labels are then assigned by
label_broad_clusters using the median rank of markers that pass
the chosen FDR and effect-size thresholds.
Edge case
collapsed broad labels:
In some datasets, all original unsupervised clusters may be assigned the same
broad label (for example, all clusters are labelled "other" or all
are labelled "immune"). In that situation, downstream subclustering
should still be able to operate on the original unsupervised clusters rather
than on a single collapsed broad label.
To support this, label_broad_clusters checks whether broad
annotation has collapsed all clusters to one unique label while the original
clustering in cluster_col still contains more than one cluster. If
this occurs, the original cluster labels are retained in label_col
instead of the collapsed broad annotation. This preserves the original
cluster structure for downstream subclustering.
Examples
if (FALSE) { # \dontrun{
sce <- annotate_broad_clusters(
sce = sce,
cluster_col = "cluster_broad_hvg",
label_col = "broad_cluster"
)
# Supplying broad_config directly
sce <- annotate_broad_clusters(
sce = sce,
broad_config = my_broad_config,
cluster_col = "cluster_broad_hvg",
label_col = "broad_cluster"
)
} # }