Clump Data: First Inspection
This notebook provides a comprehensive introduction to loading and analyzing clump structure data using Mera.jl. You'll learn the fundamentals of working with RAMSES clump data and understanding clump hierarchies and properties.
Learning Objectives
- Load and inspect clump structure simulation data
- Understand clump properties, hierarchies, and organization
- Analyze clump distributions and characteristics across the simulation
- Handle different clump variable types and unit conversions
- Work with IndexedTables data structures for clump analysis
- Apply memory management best practices for clump datasets
Quick Reference: Essential Clump Functions
This section provides a comprehensive reference of key Mera.jl functions for clump data analysis.
Data Loading Functions
# Load simulation metadata with clump information
info = getinfo(output_number, "path/to/simulation")
info = getinfo(400, "/path/to/sim") # Specific output
info = getinfo("/path/to/sim") # Latest output
# Load clump data - basic usage
clumps = getclumps(info) # Load all clumps, all variables
Data Exploration Functions
# Analyze clump data structure and properties
data_overview = dataoverview(clumps) # Statistical overview of clump variables
usedmemory(clumps) # Memory usage analysis
# Explore object structure
viewfields(clumps) # View ClumpDataType structure
viewfields(info.clumps_info) # View clump file information
propertynames(clumps) # List all available fields
Variable and Data Management
# Access clump information
info.clumps_info # Clump file information
clumps.data # Raw clump data table
clumps.info.scale # Scaling factors
# Common clump variables (depend on clump finder output)
# :index, :peak_x, :peak_y, :peak_z, :mass_cl, :rho_max, :rho_saddle
# Variable names are read from clump file headers automatically
IndexedTables Operations
# Work with clump data tables
using Mera.IndexedTables
# Select specific columns
select(clumps.data, (:index, :peak_x, :peak_y, :peak_z, :mass_cl)) # View positions + mass
select(data_overview, (:extrema, :index, :mass_cl)) # Statistical summary
# Extract column data
column(data_overview, :mass_cl) # Extract mass column as array
column(data_overview, :mass_cl) * info.scale.Msol # Convert to solar masses
# Transform data in-place
transform(data_overview, :mass_cl => :mass_cl => value->value * info.scale.Msol)
Unit Conversion
# Access scaling factors
scale = clumps.scale # Shortcut to scaling factors
constants = clumps.info.constants # Physical constants
# Common unit conversions for clump data
mass_msol = clumps.data.mass_cl * scale.Msol # Mass to solar masses
position_kpc = clumps.data.peak_x * scale.kpc # Position to kpc
density_gcm3 = clumps.data.rho_max * scale.g_cm3 # Density to g/cm³
size_pc = clumps.data.size * scale.pc # Size to parsecs (if available)
Memory Management
# Monitor and optimize memory usage
usedmemory(clumps) # Check current memory usage
clumps = nothing; GC.gc() # Clear variable and garbage collect
Common Analysis Workflow
# Standard clump data analysis workflow
info = getinfo(400, "/path/to/simulation") # Load simulation metadata
clumps = getclumps(info) # Load clump data
usedmemory(clumps) # Check memory usage
# Analyze structure and properties
data_overview = dataoverview(clumps) # Statistical overview
viewfields(clumps) # Explore data structure
# Convert units and extract specific data
scale = clumps.scale # Create scaling shortcut
mass_msol = select(clumps.data, :mass_cl) * scale.Msol # Physical masses
position_kpc = select(clumps.data, (:peak_x, :peak_y, :peak_z)) * scale.kpc
Clump Analysis Functions
# Work with clump properties
# Filter clumps by mass
massive_clumps = filter(c -> c.mass_cl > 1e5, clumps.data)
Package Import and Initial Setup
Let's start by importing Mera.jl and loading simulation information for output 300:
using Mera
info = getinfo(400, "/Volumes/FASTStorage/Simulations/Mera-Tests/manu_sim_sf_L14");
[Mera]: 2025-08-14T14:04:17.072
Code: RAMSES
output [400] summary:
mtime: 2018-09-05T09:51:55
ctime: 2025-06-29T20:06:45.267
=======================================================
simulation time: 594.98 [Myr]
boxlen: 48.0 [kpc]
ncpu: 2048
ndim: 3
-------------------------------------------------------
amr: true
level(s): 6 - 14 --> cellsize(s): 750.0 [pc] - 2.93 [pc]
-------------------------------------------------------
hydro: true
hydro-variables: 7 --> (:rho, :vx, :vy, :vz, :p, :var6, :var7)
hydro-descriptor: (:density, :velocity_x, :velocity_y, :velocity_z, :thermal_pressure, :passive_scalar_1, :passive_scalar_2)
γ: 1.6667
-------------------------------------------------------
gravity: true
gravity-variables: (:epot, :ax, :ay, :az)
-------------------------------------------------------
particles: true
- Npart: 5.091500e+05
- Nstars: 5.066030e+05
- Ndm: 2.547000e+03
particle-variables: 5 --> (:vx, :vy, :vz, :mass, :birth)
-------------------------------------------------------
rt: false
-------------------------------------------------------
clumps: true
clump-variables: (:index, :lev, :parent, :ncell, :peak_x, :peak_y, :peak_z, Symbol("rho-"), Symbol("rho+"), :rho_av, :mass_cl, :relevance)
-------------------------------------------------------
namelist-file: false
timer-file: false
compilation-file: true
makefile: true
patchfile: true
=======================================================
Understanding Clump Properties
The output above provides a comprehensive overview of the loaded clump data properties:
- Clump files status - Confirms existence and accessibility of clump data files
- Variable names - Lists the clump variable names from the file headers
- Data structure - Reveals how the clump data is organized and stored
- File format - Shows the automatic parsing of clump file structure
Loading Clump Data
Now that we understand our simulation's structure and clump organization, let's load the actual clump data. We'll use Mera's powerful data loading capabilities to read clump structures and their properties.
Data Loading Overview
The getclumps()
function is the primary tool for loading clump data from RAMSES simulations. It provides extensive options for:
- Variable selection - Choose specific clump properties
- Mass filtering - Focus on clumps within specific mass ranges
- Spatial filtering - Focus on regions of interest
- Automatic parsing - Column names are automatically read from clump file headers
Note: Mera automatically checks the first line of each clump file to determine column names and data structure.
Loading Complete Clump Dataset
Now let's load all clump data from the simulation. This will read:
- All clump structures - Complete catalog of identified clumps
- All available variables - All clump properties present in the files
- Automatic column parsing - Variable names determined from file headers
- Efficient organization - Data structured for analysis and manipulation
clumps = getclumps(info);
[Mera]: Get clump data: 2025-08-14T14:04:19.532
domain:
xmin::xmax: 0.0 :: 1.0 ==> 0.0 [kpc] :: 48.0 [kpc]
ymin::ymax: 0.0 :: 1.0 ==> 0.0 [kpc] :: 48.0 [kpc]
zmin::zmax: 0.0 :: 1.0 ==> 0.0 [kpc] :: 48.0 [kpc]
Read 12 colums:
[:index, :lev, :parent, :ncell, :peak_x, :peak_y, :peak_z, Symbol("rho-"), Symbol("rho+"), :rho_av, :mass_cl, :relevance]
Memory used for data table :61.58203125 KB
-------------------------------------------------------
Memory Usage Analysis
The memory consumption of the loaded data is displayed automatically. For detailed memory analysis of any object, Mera.jl provides the usedmemory()
function:
usedmemory(clumps);
Memory used: 363.003 KB
Understanding Data Types
The loaded data object is now of type ClumpDataType
, which is specifically designed for clump structure data:
typeof(clumps)
ClumpDataType
Type Hierarchy
ClumpDataType
is part of a well-organized type hierarchy. It's a sub-type of ContainMassDataSetType
:
# Which in turn is a subtype of the general `DataSetType`.
supertype( ContainMassDataSetType )
DataSetType
Which in turn is a sub-type of the general DataSetType
:
supertype( ClumpDataType )
ContainMassDataSetType
Data Organization and Structure
The clump data is stored in an IndexedTables table format, with clump variables and parameters organized into accessible fields. Let's explore the structure:
viewfields(clumps)
data ==> IndexedTables: (:index, :lev, :parent, :ncell, :peak_x, :peak_y, :peak_z, Symbol("rho-"), Symbol("rho+"), :rho_av, :mass_cl, :relevance)
info ==> subfields: (:output, :path, :fnames, :simcode, :mtime, :ctime, :ncpu, :ndim, :levelmin, :levelmax, :boxlen, :time, :aexp, :H0, :omega_m, :omega_l, :omega_k, :omega_b, :unit_l, :unit_d, :unit_m, :unit_v, :unit_t, :gamma, :hydro, :nvarh, :nvarp, :nvarrt, :variable_list, :gravity_variable_list, :particles_variable_list, :rt_variable_list, :clumps_variable_list, :sinks_variable_list, :descriptor, :amr, :gravity, :particles, :rt, :clumps, :sinks, :namelist, :namelist_content, :headerfile, :makefile, :files_content, :timerfile, :compilationfile, :patchfile, :Narraysize, :scale, :grid_info, :part_info, :compilation, :constants)
boxlen = 48.0
ranges = [0.0, 1.0, 0.0, 1.0, 0.0, 1.0]
selected_clumpvars = [:index, :lev, :parent, :ncell, :peak_x, :peak_y, :peak_z, Symbol("rho-"), Symbol("rho+"), :rho_av, :mass_cl, :relevance]
scale ==> subfields: (:Mpc, :kpc, :pc, :mpc, :ly, :Au, :km, :m, :cm, :mm, :μm, :Mpc3, :kpc3, :pc3, :mpc3, :ly3, :Au3, :km3, :m3, :cm3, :mm3, :μm3, :Msol_pc3, :Msun_pc3, :g_cm3, :Msol_pc2, :Msun_pc2, :g_cm2, :Gyr, :Myr, :yr, :s, :ms, :Msol, :Msun, :Mearth, :Mjupiter, :g, :km_s, :m_s, :cm_s, :nH, :erg, :g_cms2, :T_mu, :K_mu, :T, :K, :Ba, :g_cm_s2, :p_kB, :K_cm3, :erg_g_K, :keV_cm2, :erg_K, :J_K, :erg_cm3_K, :J_m3_K, :kB_per_particle, :J_s, :g_cm2_s, :kg_m2_s, :Gauss, :muG, :microG, :Tesla, :eV, :keV, :MeV, :erg_s, :Lsol, :Lsun, :cm_3, :pc_3, :n_e, :erg_g_s, :erg_cm3_s, :erg_cm2_s, :Jy, :mJy, :microJy, :atoms_cm2, :NH_cm2, :cm_s2, :m_s2, :km_s2, :pc_Myr2, :erg_g, :J_kg, :km2_s2, :u_grav, :erg_cell, :dyne, :s_2, :lambda_J, :M_J, :t_ff, :alpha_vir, :delta_rho, :a_mag, :v_esc, :ax, :ay, :az, :epot, :a_magnitude, :escape_speed, :gravitational_redshift, :gravitational_energy_density, :gravitational_binding_energy, :total_binding_energy, :specific_gravitational_energy, :gravitational_work, :jeans_length_gravity, :jeans_mass_gravity, :jeansmass, :freefall_time_gravity, :ekin, :etherm, :virial_parameter_local, :Fg, :poisson_source, :ar_cylinder, :aϕ_cylinder, :ar_sphere, :aθ_sphere, :aϕ_sphere, :r_cylinder, :r_sphere, :ϕ, :dimensionless, :rad, :deg)
Convenient Data Access
For convenience, all fields from the original InfoType
object are now accessible through:
clumps.info
- All simulation metadata and parametersclumps.scale
- Scaling relations for converting from code units to physical units
The data object also retains important structural information:
- Box dimensions and coordinate ranges
- Selected spatial regions and filtering parameters
- Number and properties of loaded clumps
Quick Field Reference
For a simple list of all available fields in the clump data object:
propertynames(clumps)
(:data, :info, :boxlen, :ranges, :selected_clumpvars, :used_descriptors, :scale)
Data Analysis and Exploration
Now that we have loaded our clump data, let's explore its structure and properties in detail. This section demonstrates the key analysis functions available in Mera.jl.
Analysis Overview
We'll focus on statistical analysis of clump properties:
- Statistical Data Overview - Computing basic statistical properties of clump variables, understanding mass distributions, spatial distributions of clumps, peak positions and other key parameters, and assessing data quality and identifying potential issues
The following analysis will help us understand the overall structure and properties of our clump population.
Statistical Data Analysis
The dataoverview()
function computes comprehensive statistics for all clump variables in our dataset. This analysis provides:
- Variable ranges - Minimum and maximum values for each clump property
The calculated information is stored in code units and can be accessed for further analysis:
data_overview = dataoverview(clumps)
Table with 2 rows, 13 columns:
Columns:
# colname type
───────────────────
1 extrema Any
2 index Any
3 lev Any
4 parent Any
5 ncell Any
6 peak_x Any
7 peak_y Any
8 peak_z Any
9 rho- Any
10 rho+ Any
11 rho_av Any
12 mass_cl Any
13 relevance Any
Working with IndexedTables
When dealing with tables containing many columns, only a summary view is typically displayed. To access specific columns, use the select()
function.
Important Notes:
- Column names are specified as quoted Symbols (
:column_name
) - For more details, see the Julia documentation on Symbols
- The
select()
function maintains data order and relationships
Let's select specific columns to examine clump properties:
using Mera.IndexedTables
select(data_overview, (:extrema, :index, :peak_x, :peak_y, :peak_z, :mass_cl) )
Table with 2 rows, 6 columns:
extrema index peak_x peak_y peak_z mass_cl
──────────────────────────────────────────────────────
"min" 4.0 10.292 9.93604 22.1294 0.00031216
"max" 2147.0 38.1738 35.7056 25.4634 0.860755
Unit Conversion Example
Extract mass data from a specific column and convert it to solar masses. The select()
function retrieves data from specific table columns, maintaining the order consistent with the table structure:
select(data_overview, :mass_cl) * info.scale.Msol
2-element Vector{Float64}:
312073.3187055649
8.605166312657958e8
In-Place Unit Conversion
Alternatively, you can directly convert the data within the table using the transform()
function. This modifies the table in-place, converting the :mass_cl
column to solar mass units:
data_overview = transform(data_overview, :mass_cl => :mass_cl => value->value * info.scale.Msol);
select(data_overview, (:extrema, :index, :peak_x, :peak_y, :peak_z, :mass_cl) )
Table with 2 rows, 6 columns:
extrema index peak_x peak_y peak_z mass_cl
─────────────────────────────────────────────────────
"min" 4.0 10.292 9.93604 22.1294 3.12073e5
"max" 2147.0 38.1738 35.7056 25.4634 8.60517e8
Data Structure Deep Dive
Now let's examine the detailed structure of our clump data. Understanding this organization is crucial for effective data manipulation and analysis.
IndexedTables Storage Format
The clump data is stored in clumps.data
as an IndexedTables table (in code units), which provides several key advantages:
- Row-based organization: Each row represents a unique clump in the simulation
- Column-based access: Each column represents a specific clump property
- Efficient operations: Built-in support for filtering, mapping, and aggregation
- Memory efficiency: Optimized storage and access patterns for clump catalogs
- Functional interface: Clean, composable operations for data manipulation
For comprehensive information on working with this data structure:
- Mera.jl documentation and tutorials
- JuliaDB API Reference
- IndexedTables.jl documentation
Understanding the Data Layout
The table structure reflects the clump catalog organization:
Clump Identification and Properties
- Peak positions (peakx, peaky, peak_z) represent clump centers in code units
- Index values provide unique identifiers for each clump
- Mass and density properties characterize clump physical properties
- Spatial relationships maintain clump hierarchies and associations
Critical Data Integrity Notes
- Position preservation: The peak positions are essential for spatial analysis
- Do not modify: These coordinates are used by many Mera functions
- Unique identifiers: Each clump index uniquely identifies a structure
Let's examine the complete data table:
clumps.data
Table with 644 rows, 12 columns:
Columns:
# colname type
──────────────────────
1 index Float64
2 lev Float64
3 parent Float64
4 ncell Float64
5 peak_x Float64
6 peak_y Float64
7 peak_z Float64
8 rho- Float64
9 rho+ Float64
10 rho_av Float64
11 mass_cl Float64
12 relevance Float64
Focused Data Examination
For a more detailed view of specific columns, we can select key fields to understand the clump organization better:
select(clumps.data, (:index, :peak_x, :peak_y, :peak_z, :mass_cl) )
Table with 644 rows, 5 columns:
index peak_x peak_y peak_z mass_cl
─────────────────────────────────────────────
4.0 20.1094 11.5005 23.9604 0.0213767
5.0 20.1592 11.5122 23.9253 0.0131504
9.0 21.7852 17.855 23.814 0.00358253
12.0 21.8232 17.8608 23.855 0.00509792
13.0 21.8906 17.2837 23.5415 0.0319414
18.0 21.7822 16.8823 23.7817 0.00848828
19.0 21.75 16.8589 23.7993 0.00587003
20.0 21.6006 17.5679 23.7935 0.0324672
25.0 21.5801 17.6177 23.9341 0.0245806
26.0 21.5859 17.5796 23.9165 0.0183601
29.0 21.5625 17.5854 23.8726 0.0303356
46.0 21.5215 17.6235 23.9458 0.343594
⋮
2115.0 27.7705 13.2788 23.8081 0.0340939
2116.0 27.7617 13.3081 23.8081 0.0145199
2117.0 27.7793 13.2993 23.6851 0.00855992
2120.0 27.7559 13.1792 23.8638 0.00508007
2125.0 27.7939 13.0298 23.9194 0.00128829
2128.0 27.791 13.0649 23.9019 0.00183979
2131.0 28.3037 12.8188 23.9487 0.00128627
2132.0 28.626 12.8188 23.8755 0.00434
2137.0 29.9736 15.0571 23.7202 0.00195464
2140.0 27.1436 15.6401 23.9048 0.0160477
2147.0 25.1953 9.93604 23.9897 0.0294943
Summary and Next Steps
What You've Learned
In this tutorial, you've mastered the fundamentals of working with clump data in Mera.jl:
- Data Loading: How to load clump data using
getclumps()
with various options - Structure Understanding: The organization of clump catalogs and their properties
- Variable Management: Working with automatically parsed clump variable names
- Data Analysis: Using
dataoverview()
for comprehensive clump statistical analysis - Unit Handling: Converting between code units and physical units for clump properties
- Memory Management: Monitoring and optimizing memory usage for clump datasets
- Data Manipulation: Using IndexedTables operations for efficient clump data processing
Key Takeaways
- Clump data is stored in IndexedTables format for efficient access and manipulation
- Peak positions (peakx, peaky, peak_z) are critical for spatial relationships and should not be modified
- Always be conscious of units - raw data is in code units
- Memory management is important for large clump catalogs
- Variable names are automatically parsed from clump file headers
- Mera.jl provides powerful tools for statistical analysis and clump data exploration
Continue Your Learning
Now that you understand clump data fundamentals, you can explore:
- Advanced clump analysis: Hierarchical structure analysis, mass function studies, and spatial clustering
- Clump evolution studies: Tracking clump properties across multiple outputs
- Multi-physics analysis: Combining clump data with hydro, particle, and gravity data
- Statistical analysis: Advanced statistical methods for clump population studies
- Performance optimization: Advanced techniques for large-scale clump data processing