Clump Finding

clumpfind locates connected over-dense structures and returns a per-clump catalog. It works two ways:

  • 3D — friends-of-friends on the cells (hydro) or particles above a field threshold.
  • 2D — connected-component labelling of a projection map above a threshold.

Both return a ClumpCatalog sorted most-massive-first.

Try it on known ground truth

The synthetic, data-free example builds a Mera object whose clumps are known exactly and scores every finder and feature against it (downloadable dataset included) — the quickest way to see the behaviour and accuracy of each algorithm.

The 3D finder runs on a pluggable framework: an AbstractFinder value (one of seven — ThresholdFoF, DensityWatershed, Dendrogram, GraphSegFinder, HDBSCANFinder, PhaseSpaceFoF, PersistenceFinder) selects the algorithm, while a shared neighbour index, statistics, boundedness and catalog pipeline serves them all. The keyword form clumpfind(obj, field; …) shown throughout this page is a convenience shim that builds a ThresholdFoF for you, so existing scripts are unchanged; pass a finder explicitly to pick the algorithm:

cat   = clumpfind(gas, ThresholdFoF(:rho; threshold=1e2, threshold_unit=:nH, linking_length=0.2))
cores = clumpfind(gas, DensityWatershed(:rho; threshold=1e2, threshold_unit=:nH, linking_length=0.4))

Choosing a finder

Seven AbstractFinder algorithms plug into the same neighbour-index / statistics / boundedness / catalog pipeline, so they share keywords and outputs and differ only in how cells are grouped. Start with ThresholdFoF; reach for the others when its single threshold isn't enough.

FinderMethodReach for it when
ThresholdFoFFriends-of-friends above a field threshold (Davis et al. 1985)The default — fast, robust; isolated clumps over a clear background
DensityWatershedFoF connectivity, then watershed split at saddles (DENMAX/SUBFIND); persistence prunes shallow basinsDeblending touching/overlapping peaks inside one connected over-dense region
DendrogramMulti-scale hierarchy (Rosolowsky & Leroy 2008); min_delta peak-to-saddle contrastYou want the nested hierarchy (leaves → branches → roots), not a flat catalog
GraphSegFinderGraph segmentation by internal-vs-boundary contrast (Felzenszwalb & Huttenlocher 2004)Density varies smoothly and no single threshold separates structures
HDBSCANFinderDensity-based hierarchical clustering, stable-cluster extractionClumps span a wide density range / variable background; you'd rather not pick a threshold
PhaseSpaceFoF6-D position+velocity FoF (Rockstar-style; Behroozi et al. 2013)Kinematically separating spatially-overlapping structures (streams, mergers, substructure)
PersistenceFinderTopological persistence / ToMATo (Chazal et al. 2013)Crowded fields — rank peaks by prominence, robust to noise

All take the same shared keywords (field, threshold, linking_length, backend, gravitational boundedness, tidal truncation, …); see the Density-adaptive finders and Phase-space & topology sections below for the algorithm-specific parameters.

3D — cells or particles (friends-of-friends)

Cells/particles with field ≥ threshold are linked into a clump when they lie within linking_length (in pos_unit) of one another:

gas = gethydro(getinfo(output, path))

cat = clumpfind(gas, :rho;
                threshold=1e2, threshold_unit=:nH,   # select cells above 100 cm⁻³
                linking_length=0.2, pos_unit=:kpc,    # link within 0.2 kpc
                mass_unit=:Msol, min_members=5)       # keep clumps with ≥ 5 cells

length(cat)        # number of clumps
cat[1]             # most massive clump (a NamedTuple)

With a Makie backend loaded, clumpplot draws the catalog directly — each clump's centre of mass as a marker sized by mass (and coloured by log mass), optionally over a projection background:

using CairoMakie
bg  = projection(gas, :sd, :Msol_pc2; center=[:bc])
fig = clumpplot(cat; background=bg)        # marker size ∝ mass, colour = log₁₀ mass

Clumps found by `clumpfind` ([`clumpplot`](@ref): marker size ∝ mass, colour = log₁₀ mass) overlaid on the gas surface density Σ. The markers sit on the over-dense peaks above the density threshold; fainter arm and inter-arm gas *below* the threshold is intentionally not flagged — clump finding is always threshold- (and finder-) dependent.

Not every visible peak is a clump — by design

A clump is what the finder + threshold define. Two effects are worth knowing:

  • Threshold selection. Peaks fainter than threshold are not selected at all, and a single friends-of-friends threshold can merge a whole connected over-dense region (e.g. the dense disk) into one clump while leaving fainter arms out. To separate touching peaks, use DensityWatershed (split at saddles, with persistence to prune shallow basins) rather than a higher ThresholdFoF threshold.
  • Boundedness. Detected over-densities are not necessarily self-gravitating. Add boundedness=true to get each clump's virial ratio alpha_vir = 2·e_kin/|e_grav| and a bound flag, and bound_only=true to keep only self-bound clumps. (On a coarse box, many "clumps" are turbulence-supported, alpha_vir ≫ 1, and would be dropped by bound_only.)
`Dendrogram` name clash with Makie

Makie also exports a Dendrogram type, so when both are loaded (using Mera, CairoMakie) a bare Dendrogram(...) is ambiguous — qualify Mera's finder as Mera.Dendrogram(...) in that case. The other six finders have unique names.

The same call works on particles (e.g. cluster-finding on stars):

stars = getparticles(getinfo(output, path))
cat = clumpfind(stars, :mass; threshold=0.0, linking_length=0.5)

Choosing parameters. linking_length should be a few times the local resolution — comparable to or larger than the finest cell size (3D AMR) or the mean interparticle separation (particles); too small and dense regions fragment, too large and separate clumps merge. threshold sets which material is considered (e.g. a number-density floor for the cold/dense gas). min_members drops noise-sized detections; mask restricts the search to a pre-selected subset.

Gravitational boundedness

boundedness=true adds per-clump energetics (cgs) and keeps, optionally, only self-bound structures:

cat = clumpfind(gas, :rho; threshold=1e2, threshold_unit=:nH, linking_length=0.2,
                boundedness=true, bound_only=true)
cat[1].alpha_vir       # virial parameter 2·E_kin/|E_grav|
cat[1].bound           # E_kin + E_therm < |E_grav|

Each clump gains e_kin (COM-frame kinetic), e_therm (thermal, gas), e_grav (binding energy), alpha_vir, and bound. The potential is chosen with egrav: :approx (⅗·GM²/R, fast but biased) by default, :direct (exact pairwise sum up to direct_max members), or :tree (Barnes–Hut octree, O(N log N), accurate at any N). softening (in pos_unit) softens the kernel as 1/√(r²+ε²).

iterative_unbinding=true adds SUBFIND-style unbinding: members with positive total energy in the bulk-velocity frame are stripped iteratively, so each clump's reported mass/membership is its self-bound subset.

cat = clumpfind(gas, :rho; threshold=1e2, threshold_unit=:nH, linking_length=0.2,
                boundedness=true, egrav=:tree, iterative_unbinding=true)

For watershed deblending, a DensityWatershed finder additionally accepts persistence (in field units): a basin whose prominence (peak − saddle) is below persistence is merged into the deeper basin it meets, suppressing over-segmentation of shallow saddles (Rosolowsky & Leroy 2008 min_delta):

cores = clumpfind(gas, DensityWatershed(:rho; threshold=1e2, threshold_unit=:nH,
                                        linking_length=0.4, persistence=0.3))

Validators — a composable acceptance chain

Instead of the boundedness keywords, pass a validators chain of value-typed criteria that a clump must all satisfy (an AND): MinMembers, Bound (configures the boundedness pass — potential egrav, iterative unbinding — and keeps only self-bound clumps), VirialBelow, MassAbove, and Custom (an arbitrary clump -> Bool predicate). Membership-mutating validators (Bound with unbinding) act during the analysis; predicates filter the catalog afterwards — regardless of the order listed. A non-empty validators overrides the boundedness/bound_only/min_members/egrav/iterative_unbinding keywords.

# ≥20 members, tree-gravity self-bound (iterative unbinding), and virially bound:
cores = clumpfind(gas, DensityWatershed(:rho; threshold=1e2, threshold_unit=:nH, linking_length=0.4);
                  validators=[MinMembers(20), Bound(:tree; iterative=true), VirialBelow(2.0)])
# arbitrary cut via Custom:
big = clumpfind(gas, ThresholdFoF(:rho; threshold=1e2, threshold_unit=:nH, linking_length=0.2);
                validators=[Custom(c -> c.mass > 1e4 && c.radius < 0.1)])

Deblending overlapping clumps

A single threshold merges touching structures into one friends-of-friends group. deblend splits each group at its density peaks (peaks separated by peak_min_distance in pos_unit):

cat = clumpfind(gas, :rho; threshold=1e2, threshold_unit=:nH, linking_length=0.4,
                deblend=:peak, peak_min_distance=0.3)        # nearest-peak (also `deblend=true`)
cat = clumpfind(gas, :rho; threshold=1e2, threshold_unit=:nH, linking_length=0.4,
                deblend=:watershed)                          # density-descending basins (respects saddles)

:peak assigns each member to the nearest peak; :watershed floods the density field from each peak downhill (DENMAX/SUBFIND-style for points, Meyer flooding for 2-D maps), which follows saddles better. Both are mass-conserving (every member/pixel lands in exactly one clump).

Bound-substructure trees

substructure=true builds a two-level tree: each top-level clump is split into density basins (watershed) and the gravitationally self-bound ones (≥ sub_min_members) are attached as nested subclumps. Top clumps gain the boundedness fields too. tidal=true additionally strips each subclump's members beyond its Jacobi radius r_t = D·(m_sub/3·M_host(<D))^{1/3} relative to the host (parent) clump (King 1962; Binney & Tremaine 2008).

cat = clumpfind(gas, :rho; threshold=1e2, threshold_unit=:nH, linking_length=0.4, substructure=true)
cat[1].n_subclumps          # number of self-bound subclumps inside the most massive clump
cat[1].subclumps[1].mass    # the largest bound subclump's mass

tidal=:tensor uses the tidal-tensor / Hill radius instead of the Jacobi form: it fits the local gravity acceleration field a(x) (from a gravity object, getgravity) around each subclump to the tidal tensor T_ij = −∂²Φ/∂x_i∂x_j and truncates at r_t³ = G·m_sub / λ_max(T) — exactly the Hill radius R·(m_sub/2M)^{1/3} for a point-mass host. tidal_sample (default 3) sets the fit radius in units of the subclump radius.

grav = getgravity(getinfo(output, path))
cat  = clumpfind(gas, :rho; threshold=1e2, threshold_unit=:nH, linking_length=0.4,
                 substructure=true, tidal=:tensor, gravity=grav)

Multi-field — gas + stars + dark matter together

Pass a vector of components to find over-densities across several mass species in one pass. Each component pre-selects its points (with its own field/threshold and an optional mask); the catalog reports a per-component mass/count breakdown per clump:

cat = clumpfind([
    (obj=gas,   field=:rho,  threshold=1e2, threshold_unit=:nH, name=:gas),
    (obj=parts, field=:mass, threshold=0.0, name=:stars, mask = o -> getvar(o,:birth) .> 0),
    (obj=parts, field=:mass, threshold=0.0, name=:dm,    mask = o -> getvar(o,:birth) .<= 0),
]; linking_length=0.5)

cat[1].mass                  # total mass of the largest structure
cat[1].components.gas.mass   # …split by component
cat[1].components.dm.n       # dark-matter particle count

Pass boundedness=true to get the combined-cloud energetics (e_kin, e_therm, e_mag, e_grav, alpha_vir, bound) summed over all species — the self-gravity test uses gas + stars + DM together while the components breakdown stays the per-species mass budget (egrav, softening, iterative_unbinding, bound_only work as in the single-object form).

Mass function & report integration

m, N   = clump_massfunction(cat; nbins=20, scale=:log)   # differential dN per mass bin
m, Ngt = clump_massfunction(cat; cumulative=true)        # cumulative N(≥M)

using CairoMakie
fig = massfunctionplot(cat; cumulative=true)             # plot it directly (log–log)

Cumulative clump mass function ([`massfunctionplot`](@ref) / [`clump_massfunction`](@ref)): the number of clumps with mass ≥ M, on log–log axes.

A ClumpCard runs clumpfind inside a First-Look Report (the full catalog is kept in the card's data.catalog):

report(output; path, cards=[ ClumpCard(:hydro, :rho; threshold=1e2, threshold_unit=:nH,
                                       linking_length=0.2) ])

2D — a projection map (connected components)

Run it on any projection result to segment a map above a threshold:

sd  = projection(gas, :sd, :Msol_pc2; res=512, center=[:bc])
cat = clumpfind(sd, :sd; threshold=50.0, connectivity=8)   # regions ≥ 50 M⊙/pc²

connectivity is 8 (diagonals count) or 4. For a surface-density map each region's mass is the exact area-integral Σ value · pixel_area; positions are in the map's extent units.

The catalog

Each entry is a NamedTuple; the fields differ slightly between 3D and 2D:

fieldmeaning
idrank (1 = most massive)
n_memberscells / particles (3D) or pixels (2D)
massclump mass (3D) or area-integral (2D)
comcentre of mass — (x,y,z) (3D) or (x,y) (2D)
peak, peak_posmaximum field value and its position
radiusmaximum member distance from the COM
cat = clumpfind(gas, :rho; threshold=1e2, threshold_unit=:nH, linking_length=0.2)
[c.mass for c in cat]               # mass function input
cat[1].com                          # densest clump's centre
cat.meta                            # the search parameters used

ClumpCatalog behaves like a vector (length, cat[i], iteration). For analysis/export, get a columnar table (a NamedTuple of vectors — including boundedness and per-component columns when present), ready for DataFrame / CSV.write:

tbl = clumptable(cat)         # (; id, n_members, mass, com_x, com_y, com_z, radius, …)

See also getclumps to load a RAMSES-produced clump catalog instead of finding clumps yourself, and Off-axis Projection for tilted maps to segment in 2D.

Multi-scale hierarchy (dendrogram)

A Dendrogram finder returns the finest density peaks (local maxima with prominence ≥ min_delta) as the catalog's leaf clumps; passing hierarchy=true additionally attaches the full merge StructureTree — the level at which leaves join into branches and ultimately roots (Rosolowsky & Leroy 2008):

cat  = clumpfind(gas, Dendrogram(:rho; threshold=1e2, threshold_unit=:nH,
                                 linking_length=0.5, min_delta=0.3); hierarchy=true)
tree = cat.tree
length(Mera.leaves(tree))               # finest structures (= the catalog clumps)
r = Mera.roots(tree)[1]                  # a top-level structure
Mera.children(tree, r)                   # its immediate sub-structures
r.n_subtree                              # members in the whole subtree

Density-adaptive finders

Two further finders handle variable-density fields without a single hard threshold:

  • HDBSCANFinder — a self-contained HDBSCAN\* (Campello+2013; McInnes+2017): core distances define a mutual-reachability metric whose MST is condensed into a cluster hierarchy, and the most stable clusters (≥ min_cluster_size) are extracted. Near parameter-free; points outside any stable cluster are labelled noise (dropped).
  • GraphSegFinder — Felzenszwalb & Huttenlocher (2004) graph segmentation: keeps within-region density variation below the between-region contrast, with a single scale knob. Near-linear; a fast multi-scale deblender.
cat = clumpfind(gas, HDBSCANFinder(:rho; threshold=1e2, threshold_unit=:nH,
                                   linking_length=2.0, min_cluster_size=20))
cat = clumpfind(gas, GraphSegFinder(:rho; threshold=1e2, threshold_unit=:nH,
                                    linking_length=1.0, scale=5.0))

Finder composition

deblend can be any finder: a cheap finder establishes connectivity, then the deblend finder splits each group — e.g. friends-of-friends connectivity refined per-group by HDBSCAN (something yt cannot do):

cat = clumpfind(gas, ThresholdFoF(:rho; threshold=1e2, threshold_unit=:nH, linking_length=1.0);
                deblend=HDBSCANFinder(:rho; threshold=1e2, linking_length=0.5, min_cluster_size=30))

Threading

The per-clump statistics/boundedness pass is threaded; max_threads (default Threads.nthreads()) caps it, and the result is identical to the serial output regardless of thread count.

Neighbour backend

Every finder takes a backend for its spatial neighbour search: CellLinkedList (default), HashGrid, or MortonGrid — which visits points along a Z-order (Morton) curve so spatially-near points are near in memory, improving cache locality on large selections (the same ordering an out-of-core path needs). All three return identical results; only speed differs.

cat = clumpfind(gas, ThresholdFoF(:rho; threshold=1e2, threshold_unit=:nH,
                                  linking_length=0.5, backend=MortonGrid))

Phase-space & topology

  • PhaseSpaceFoF — 6-D friends-of-friends (Rockstar-style; Behroozi+2013): points link only when within linking_length_pos in space and linking_length_vel (km/s) in velocity, so kinematically distinct populations that overlap spatially — streams, subhaloes, tidal debris — separate. Velocities are loaded automatically.
  • PersistenceFinder — 0-dim persistent homology / ToMATo (Chazal+2013): a peak is kept as a separate cluster only if its prominence (peak − merge saddle) reaches persistence. Principled, parameter-light deblending, robust in crowded fields.
# kinematically separate two overlapping stellar streams
cat = clumpfind(stars, PhaseSpaceFoF(:mass; threshold=0.0,
                                     linking_length_pos=0.2, linking_length_vel=50.0))
# topological extraction by prominence
cat = clumpfind(gas, PersistenceFinder(:rho; threshold=1e2, threshold_unit=:nH,
                                       linking_length=0.5, persistence=0.3))

Saving & validation

Persist a catalog (full fidelity — boundedness, nested subclumps, the tree) and reload it:

save_clumps("clumps_out100", cat)        # → clumps_out100.jld2
cat = load_clumps("clumps_out100.jld2")

clump_recovery scores a found segmentation against a known ground truth (per-point labels), returning the Adjusted Rand Index, completeness, purity and bijective merit — the basis of the validation harness:

m = clump_recovery(found_labels, true_labels)
m.ari            # ≈ 1 when the finder recovers the input clumps

API

The finder/hierarchy types (AbstractFinder, ThresholdFoF, DensityWatershed, Dendrogram, GraphSegFinder, HDBSCANFinder, PhaseSpaceFoF, PersistenceFinder, StructureTree, StructureNode) are documented in the API reference.

Mera.clumpfindFunction
clumpfind(obj::HydroPartType, finder::AbstractFinder; pos_unit=:kpc, mass_unit=:Msol,
          min_members=1, mask=[false], boundedness=false, bound_only=false,
          egrav=:approx, direct_max=2000, softening=0.0, iterative_unbinding=false,
          deblend=false, peak_min_distance=…, substructure=false, sub_min_members=min_members,
          tidal=false, gravity=nothing, tidal_sample=3.0, hierarchy=false,
          max_threads=Threads.nthreads()) -> ClumpCatalog

3D structure finder driven by any AbstractFinder finder value (one of the seven: ThresholdFoF, DensityWatershed, Dendrogram, GraphSegFinder, HDBSCANFinder, PhaseSpaceFoF, PersistenceFinder; it carries the field/threshold/linking-length and selects the algorithm). Per clump it returns member count, mass, centre of mass com, peak field value and peak_pos, and radius (max member distance from the COM) — positions in pos_unit, mass in mass_unit.

  • boundedness=true adds per-clump energetics (cgs): e_kin (COM-frame kinetic), e_therm (thermal, gas), e_grav (binding energy), alpha_vir = 2·e_kin/|e_grav|, and a bound flag (e_kin + e_therm < |e_grav|). bound_only=true keeps only self-bound clumps. The potential is set by egrav: :approx3/5·GM²/R (biased, fast); :direct ⇒ exact pairwise sum up to direct_max members; :tree ⇒ Barnes–Hut octree, O(N log N), accurate at any N (Barnes & Hut 1986). softening (in pos_unit) softens the kernel 1/√(r²+ε²).
  • iterative_unbinding=true runs SUBFIND-style unbinding (Springel+2001): members with positive total energy in the bulk-velocity frame are stripped iteratively until convergence, so each clump's reported membership/mass is its self-bound subset. Implies the boundedness analysis.
  • deblend=true/:peak splits merged clumps at their density peaks (members assigned to the nearest peak); deblend=:watershed instead assigns by density-descending basins. Peaks are separated by peak_min_distance (in pos_unit). (Equivalent to using a DensityWatershed finder.)
  • substructure=true builds a bound-substructure tree: each top-level clump is split into density basins (watershed) and the gravitationally self-bound ones (≥ sub_min_members) are attached as nested subclumps (with n_subclumps). Implies the boundedness analysis.
  • tidal (needs substructure=true) truncates each sub-clump at its tidal/Hill radius in the host: tidal=true uses the analytic Jacobi radius, tidal=:tensor the least-squares tidal-tensor radius from a gravity object (getgravity); tidal_sample scales the host sampling region.
  • hierarchy=true (for a Dendrogram finder) also returns the multi-scale merger tree in cat.tree (a StructureTree).
  • max_threads caps the threads used for the per-clump analysis (deterministic regardless of count).
  • validators — a composable chain of value-typed acceptance criteria (MinMembers, Bound, VirialBelow, MassAbove, Custom) that a clump must all satisfy (an AND). It is a clearer alternative to the boundedness kwargs: a Bound in the chain configures the boundedness pass (potential, iterative unbinding) and keeps only self-bound clumps; the predicate validators filter the catalog. A non-empty validators overrides boundedness/bound_only/min_members/egrav/iterative_unbinding. Membership-mutating validators run during the analysis, predicates after — independent of the order listed.
gas = gethydro(getinfo(output, path))
cat = clumpfind(gas, ThresholdFoF(:rho; threshold=1e2, threshold_unit=:nH, linking_length=0.2))
# contrast-controlled watershed + tree-gravity boundedness with iterative unbinding:
cores = clumpfind(gas, DensityWatershed(:rho; threshold=1e2, threshold_unit=:nH,
                                        linking_length=0.4, persistence=0.3);
                  boundedness=true, egrav=:tree, iterative_unbinding=true)
# the same, written as a validator chain, plus virial + size cuts:
cores = clumpfind(gas, DensityWatershed(:rho; threshold=1e2, threshold_unit=:nH, linking_length=0.4);
                  validators=[MinMembers(20), Bound(:tree; iterative=true), VirialBelow(2.0)])
clumpfind(obj::HydroPartType, field=:rho; threshold, linking_length,
          threshold_unit=:standard, pos_unit=:kpc, mass_unit=:Msol,
          min_members=1, mask=[false], boundedness=false, bound_only=false,
          egrav=:approx, direct_max=2000, deblend=false, peak_min_distance=2·linking_length,
          substructure=false, sub_min_members=min_members) -> ClumpCatalog

Convenience form of the AbstractFinder method: builds a ThresholdFoF from field/threshold/linking_length and forwards every other keyword. Existing scripts keep working unchanged; see the finder method above for the full keyword reference.

gas = gethydro(getinfo(output, path))
cat = clumpfind(gas, :rho; threshold=1e2, threshold_unit=:nH, linking_length=0.2)   # 0.2 kpc
bound = clumpfind(gas, :rho; threshold=1e2, threshold_unit=:nH, linking_length=0.2,
                  boundedness=true, bound_only=true, deblend=true)
clumpfind(components::AbstractVector; linking_length, pos_unit=:kpc, mass_unit=:Msol,
          min_members=1, boundedness=false, bound_only=false, egrav=:approx,
          direct_max=2000, softening=0.0, iterative_unbinding=false) -> ClumpCatalog

Multi-field structure finder: pre-select points from several components and link them with a single friends-of-friends pass, so over-densities in gas + stars + dark matter are found together. Each component is a NamedTuple (obj, field, threshold, name [, threshold_unit, mask]); its points with field ≥ threshold (and optional mask(obj)) join the common cloud tagged by name. Per clump the catalog reports total mass, com, radius, member count, and a components breakdown (name=(mass=…, n=…), …) per source.

boundedness=true adds the combined-cloud energetics (e_kin, e_therm, e_mag, e_grav, alpha_vir, bound) computed over all species together (each contributing its own mass and velocity; gas also its thermal/magnetic support), so the bound test uses the full self-gravity of gas + stars + DM while the components breakdown remains the per-species mass budget. egrav, direct_max, softening, iterative_unbinding and bound_only behave as in the single-object form.

cat = clumpfind([
    (obj=gas,   field=:rho,  threshold=1e2, threshold_unit=:nH, name=:gas),
    (obj=parts, field=:mass, threshold=0.0, name=:stars, mask = o->getvar(o,:birth).>0),
    (obj=parts, field=:mass, threshold=0.0, name=:dm,    mask = o->getvar(o,:birth).<=0),
]; linking_length=0.5, boundedness=true)
cat[1].components.gas.mass        # gas mass in the most massive structure
cat[1].bound                      # self-bound across all three species?
clumpfind(map::DataMapsType, field; threshold, connectivity=8, min_pixels=1) -> ClumpCatalog

2D connected-component finder on a projection map. Pixels with map[field] ≥ threshold are grouped by connectivity (4 or 8). Per region it returns pixel count n_members, mass (area-integral Σ value · pixel_area, exact for a surface-density map), com (value-weighted centroid), peak & peak_pos, and radius — positions in the map's extent units.

Mera.clump_massfunctionFunction
clump_massfunction(cat::ClumpCatalog; nbins=20, scale=:log, cumulative=false)
    -> (mass, N)

The clump mass function. Differential (default): histogram of clump masses into nbins (scale=:log ⇒ log-spaced bins) — returns (bin_centres, counts). Cumulative (cumulative=true): returns (sorted_mass, N(≥M)).

Mera.clump_recoveryFunction
clump_recovery(found_labels, true_labels; background=0) -> NamedTuple

Compare a found clump segmentation against a known ground truth, label-for-label over the same points. Returns (; ari, completeness, purity, merit, n_found, n_true, n_points):

  • ariAdjusted Rand Index (Hubert & Arabie 1985): 1 = perfect agreement, 0 = chance-level, can be slightly negative. The standard clustering-quality metric.
  • completeness — mass/count-weighted fraction of each true clump captured by its best-matching found clump, averaged over true clumps (1 = every true clump is fully contained in one found clump).
  • purity — the same from the found side (1 = no found clump mixes two true clumps).
  • merit — mean bijective merit Σ max_i n_ij²/(|found_i|·|true_j|) (Srisawat+2013 "SUSSING"), rewarding one-to-one matches.

background (default 0) is the label for unassigned points; those points are excluded from completeness/purity/merit (but kept in ari, which scores the full partition). Both label vectors must be the same length and indexed by the same points.

m = clump_recovery(found_labels, true_labels)
m.ari            # ≈ 1 when the finder recovers the input clumps
Mera.clumptableFunction
clumptable(cat::ClumpCatalog) -> NamedTuple

A columnar view of the catalog: a NamedTuple of equal-length vectors — id, n_members, mass, com_x, com_y(, com_z), radius, and (when present) peak, the boundedness columns (e_kin, e_therm, e_grav, alpha_vir, bound), and per-component masses/counts (mass_gas, n_gas, …). Drop straight into DataFrame(clumptable(cat)) or CSV.write.

Mera.save_clumpsFunction
save_clumps(filename, cat::ClumpCatalog) -> String

Write a ClumpCatalog to filename as a JLD2 file (full fidelity — per-clump fields, boundedness, nested subclumps, the hierarchy tree, and meta are all preserved). A .jld2 extension is appended if missing. Reload with load_clumps. For a flat tabular export (CSV/DataFrame) use clumptable instead.

save_clumps("clumps_out100", cat)
cat2 = load_clumps("clumps_out100.jld2")
Mera.ClumpCardType
ClumpCard(kind, field=:rho; threshold, linking_length, threshold_unit=:standard, pos_unit=:kpc, mass_unit=:Msol, min_members=1, label="")

A report card that runs clumpfind and reports the clump count + total clump mass (the full ClumpCatalog is kept in the result card's data.catalog).

Mera.synthetic_clumpsFunction
synthetic_clumps(; seed=1, lmax=7, background=:floor, noise=0.0,
                   disk_amp=14.0, disk_hr=0.22, disk_hz=0.10) -> NamedTuple

Build a reproducible, data-free 3-D Mera test bench with a known clump population — no simulation files required. Returns (; gas, particles, truth, info, true_label):

  • gas — a HydroDataType: eight Gaussian density clumps spread through the 2^lmax³ volume (box = 1 kpc), with per-cell velocities and pressure.
  • particles — a PartDataType: one particle bag per clump plus a two-component kinematic stream (for PhaseSpaceFoF).
  • truth — the ground-truth catalog (id, name, kind, pos, mass, width, vsig, bound).
  • true_labeltrue_label(x,y,z)::Int, the dominant clump id at a position (0 = background).

The data and all finders are fully three-dimensional; the clumps sit at different z, including a touching pair and a kinematically-hot unbound clump. The same eight clumps can be embedded in different environments to test how well a finder separates them from the floor:

  • background=:floor (default) — a flat low background; clumps are isolated islands.
  • background=:galaxy — clumps embedded in a smooth exponential ISM disk (disk_amp, radial/vertical scales disk_hr/disk_hz) whose inner region is itself elevated, so a fixed low threshold captures the diffuse disk as a spurious structure.
  • noise>0 — multiplicative log-normal per-cell fluctuations (turbulent ISM); noise is the dispersion of ln ρ (e.g. 0.2).

Either non-default option fills the whole grid (use lmax=6 to keep it fast). The clump ground truth is unchanged — the background is labelled 0, so a finder that absorbs the floor into its clumps is penalised by clump_recovery.

F = synthetic_clumps()
cat = clumpfind(F.gas, ThresholdFoF(:rho; threshold=5.0, linking_length=2.0/2^7))

See also save_synthetic_clumps, load_synthetic_clumps, and the "Clump Finding — Synthetic Example" documentation page.

Mera.load_synthetic_clumpsFunction
load_synthetic_clumps(file_or_dir="."; download=false, url=Mera.SYNTH_URL) -> NamedTuple

Load the synthetic clump field, returning (; gas, particles, truth). file_or_dir is either the .jld2 file itself or a directory containing mera_synthetic_clumps.jld2.

With download=true the file is fetched from url (the GitHub release asset) when it is not already present locally. The stored objects are standard Mera data types, so every Mera verb (getvar, projection, clumpfind, …) works on them — using Mera is all that is needed.

D = load_synthetic_clumps(tempdir(); download=true)   # fetch once, then load
clumpfind(D.gas, ThresholdFoF(:rho; threshold=5.0, linking_length=2.0/2^7))