
Estimate clustering parameters using the cell count
Source:R/method-L1_analysis_tracks.R
estimate_cluster_params.RdComputes sensible defaults for the number of principal components, SNN graph neighbourhood size (K), and Leiden clustering resolution based on the number of cells. All three parameters scale dynamically with dataset size using bounded transformations and are intended to allow for slight over-clustering.
Usage
estimate_cluster_params(
n_cells,
min_cluster_cells = 50L,
res_min = 0.6,
res_max = 2,
res_saturation_n = 100000L,
k_min = 10L,
k_max = 50L,
k_saturation_n = 200000L,
npc_min = 20L,
npc_max = 50L,
npc_slope = 4
)Arguments
- n_cells
Integer scalar. Number of cells in the dataset.
- min_cluster_cells
Integer scalar. Minimum number of cells required to attempt clustering. If
n_cellsis below this threshold the function returns a parameter set withskip = TRUE. Defaults to50.- res_min, res_max
Numeric scalars. Bounds for Leiden resolution. Defaults to
0.6and2.0.- res_saturation_n
Integer scalar. Cell count at which resolution reaches
res_max. Defaults to100000.- k_min, k_max
Integer scalars. Bounds for SNN nearest-neighbour K. Defaults to
10and50.- k_saturation_n
Integer scalar. Cell count at which K reaches
k_max. Defaults to200000.- npc_min, npc_max
Integer scalars. Bounds for number of principal components. Defaults to
20and50.- npc_slope
Numeric scalar. Controls the log2-based scaling rate for PCs. Defaults to
4.
Value
A named list with components:
- n_pcs
Integer. Number of principal components.
- dims
Integer vector
seq_len(n_pcs).- k
Integer. Nearest-neighbour count for SNN graph.
- resolution
Numeric. Leiden clustering resolution.
- skip
Logical.
TRUEifn_cellsis below the minimum threshold.
Parameter scaling logic
- PCs
Log2-scaled between
npc_minandnpc_max. Small datasets get fewer components to avoid overfitting noise; large datasets saturate atnpc_max.- K (SNN neighbours)
Square-root-scaled between
k_minandk_max. Grows slowly because neighbourhood size affects graph topology - too large and distinct populations merge.- Resolution (Leiden)
Square-root-scaled between
res_minandres_max. Grows faster than K because larger datasets typically contain more distinct communities and require finer partitioning.
Examples
if (FALSE) { # \dontrun{
params <- estimate_cluster_params(n_cells = ncol(sce))
if (!params$skip) {
ctx <- prepare_context(sce, n_pcs = params$n_pcs, k = params$k)
}
} # }