Multi-Threading API Reference

MERA.jl Multi-Threading Performance

High-performance parallel computing with MERA.jl: leveraging multi-core processors for accelerated astrophysical data analysis

Comprehensive reference for Mera's parallel processing functions and threading control.

Quick Threading Reference

Core Functions

# Threading setup and diagnostics
show_threading_info()           # Display threading configuration

# Performance benchmarking  
benchmark_projection_hydro(gas, [1,2,4,8])  # Test thread performance

# Progress tracking for long operations
tracker = create_progress_tracker(100)      # Create progress tracker
update_progress!(tracker, 50)               # Update to 50% complete
complete_progress!(tracker)                  # Mark as finished

Key Parameters

ParameterTypePurposeDefault
max_threadsIntLimit concurrent threadsThreads.nthreads()
verbose_threadsBoolShow threading diagnosticsfalse

Threading Information Functions

show_threading_info()

Display comprehensive threading configuration and recommendations.

Purpose:

  • Check current Julia threading setup
  • Get performance recommendations
  • Troubleshoot threading issues

Returns: Nothing (prints information)

Example:

using Mera
show_threading_info()
# Output:
# 🧵 JULIA THREADING INFORMATION
# ===============================
# Available threads: 8
# CPU cores: 8
# ✅ Multi-threading enabled
# 
# 🚀 PERFORMANCE RECOMMENDATIONS
# ==============================
# Variable-based parallel processing:
#   • 2+ variables: Automatic variable-based parallelization
#   • Single variable: Optimized sequential processing
#   • Threading scales linearly with variable count

When to use:

  • Before starting threading-intensive work
  • When experiencing performance issues
  • To verify Julia was started with threading enabled

Performance Benchmarking

benchmark_projection_hydro(gas_data, thread_counts, n_runs=10, output_file="")

Benchmark projection performance across different thread counts to find optimal settings.

Arguments:

  • gas_data: Hydro data object from gethydro()
  • thread_counts: Vector of thread counts to test (e.g., [1,2,4,8])
  • n_runs: Number of benchmark runs per thread count (default: 10)
  • output_file: Optional file to save results (default: auto-generated)

Returns: Performance results and creates benchmark report

Example:

using Mera
info = getinfo(400, "../data")
gas = gethydro(info; lmax=10)

# Test different thread counts
thread_counts = [1, 2, 4, 8, 16]
benchmark_projection_hydro(gas, thread_counts, 5)

# Output shows optimal thread count for your system

Performance Analysis:

  • Tests projection performance with multiple variables
  • Identifies optimal max_threads values
  • Detects resource bottlenecks and contention
  • Provides specific recommendations for your hardware

Interpreting Results:

  • Look for thread count with best performance/thread ratio
  • Watch for performance degradation at high thread counts
  • Consider I/O vs compute-bound characteristics

Progress Tracking

For long-running multi-threaded operations with optional Zulip notifications.

create_progress_tracker(total_items; kwargs...)

Create a progress tracker for monitoring threaded operations.

Arguments:

  • total_items: Total number of items to process

Keyword Arguments:

  • time_interval: Seconds between time-based notifications (default: 300)
  • progress_interval: Percentage between progress notifications (default: 10)
  • task_name: Descriptive name for the task (default: "Processing")
  • zulip_channel: Zulip channel for notifications (default: "progress")
  • zulip_topic: Zulip topic for notifications (default: "Task Progress")

Returns: Dictionary containing tracker state

Example:

# Create tracker for 1000 snapshots
tracker = create_progress_tracker(1000; 
                                 task_name="Multi-snapshot analysis",
                                 progress_interval=5)  # Notify every 5%

update_progress!(tracker, current_item, custom_message="")

Update progress tracker with current status.

Arguments:

  • tracker: Tracker dictionary from create_progress_tracker()
  • current_item: Current item number being processed
  • custom_message: Optional custom status message

Returns: Nothing (updates tracker and may send notifications)

Thread Safety: Safe to call from multiple threads

Example:

@threads for i in 1:1000
    # Process snapshot
    analyze_snapshot(snapshots[i])
    
    # Update progress (thread-safe)
    update_progress!(tracker, i, "Processed snapshot $(snapshots[i])")
end

complete_progress!(tracker, final_message=""; include_summary=true)

Mark progress tracking as complete and send final notification.

Arguments:

  • tracker: Tracker dictionary
  • final_message: Custom completion message
  • include_summary: Include timing summary (default: true)

Returns: Nothing (sends completion notification)

Example:

complete_progress!(tracker, "Multi-snapshot analysis completed successfully")

Threading Control Parameters

max_threads Parameter

Controls the maximum number of threads used by Mera functions.

Available in:

  • gethydro()
  • getgravity()
  • getparticles()
  • projection()
  • Most analysis functions

Usage patterns:

# Outer-loop parallelism: limit inner threading
@threads for snapshot in snapshots
    gas = gethydro(info; max_threads=1)  # Serial loading
end

# Inner-kernel parallelism: use all threads
gas = gethydro(info)  # Uses Threads.nthreads() by default

# Mixed: controlled allocation
gas = gethydro(info; max_threads=4)      # 4 threads for I/O
proj = projection(gas, vars; max_threads=4)  # 4 threads for compute

Optimization Guidelines:

  • I/O-bound operations: 2-8 threads often optimal
  • CPU-bound operations: Match physical cores
  • Memory-bound operations: 2-4 threads recommended
  • Network storage: Fewer threads usually better

verbose_threads Parameter

Enables detailed threading diagnostics and performance metrics.

Available in:

  • projection() functions

Example:

# Enable detailed threading diagnostics
proj = projection(gas, [:rho, :T, :vx, :vy]; 
                 verbose_threads=true,
                 max_threads=4)

# Output shows:
# - Thread assignment per variable
# - Load balancing information
# - Per-thread performance metrics
# - Memory allocation patterns

Diagnostic Output Includes:

  • Thread utilization per variable
  • Load balancing effectiveness
  • Memory allocation per thread
  • Execution time breakdown
  • Resource contention indicators

Threading Patterns Reference

Pattern Selection Guide

┌─ Multiple independent tasks? 
│  ├─ Yes → Outer-Loop Pattern
│  └─ No ↓
└─ Single large dataset?
   ├─ Yes → Inner-Kernel Pattern  
   └─ Complex workflow → Mixed Pattern

Outer-Loop Pattern

Best for: Multiple snapshots, parameter studies, independent analyses

@threads for item in work_items
    result = mera_function(item; max_threads=1)
end

Inner-Kernel Pattern

Best for: Single large dataset, multiple variables, complex analysis

gas = gethydro(info)  # Full threading
proj = projection(gas, multiple_variables)  # Parallel per variable

Mixed Pattern

Best for: Controlled resource allocation, complex workflows

task1 = @spawn mera_function1(data; max_threads=N1)
task2 = @spawn mera_function2(data; max_threads=N2)
results = fetch.([task1, task2])

Thread Safety Guidelines

Safe Operations

# ✅ Pre-allocated arrays (each thread writes to different index)
results = Vector{Float64}(undef, n)
@threads for i in 1:n
    results[i] = compute(i)
end

# ✅ Atomic operations
total = Atomic{Float64}(0.0)
@threads for i in 1:n
    atomic_add!(total, compute(i))
end

# ✅ Thread-local accumulators
result = @distributed (+) for i in 1:n
    compute(i)
end

Unsafe Operations

# ❌ Race conditions
total = 0.0
@threads for i in 1:n
    global total += compute(i)  # DANGEROUS
end

# ❌ Shared mutable state without synchronization
shared_dict = Dict()
@threads for i in 1:n
    shared_dict[i] = compute(i)  # DANGEROUS
end

Performance Optimization

Memory Management

Pre-allocation:

# Good: Allocate once
results = Vector{Float64}(undef, n_items)
@threads for i in 1:n_items
    results[i] = expensive_computation(i)
end

Garbage Collection:

# Monitor GC impact
@time threaded_analysis()  # Watch "gc time" percentage

# Reduce allocations
@. output_array = input1 + input2  # In-place operations

Thread Utilization

Check load balancing:

# Uneven workloads: use @spawn instead of @threads
tasks = [@spawn process_item(item) for item in variable_workload]
results = fetch.(tasks)

Resource monitoring:

# Use verbose_threads to identify bottlenecks
proj = projection(gas, vars; verbose_threads=true)
# Look for thread imbalances or resource contention

Troubleshooting

Common Issues

Poor scaling:

  • Check for I/O bottlenecks with max_threads reduction
  • Monitor memory bandwidth with fewer threads
  • Use verbose_threads=true to identify contention

High GC time:

  • Pre-allocate arrays instead of growing with push!
  • Use in-place operations (@. macro)
  • Process data in chunks for large datasets

Race conditions:

  • Use atomic operations for simple reductions
  • Pre-allocate arrays with fixed indices per thread
  • Add proper synchronization for complex shared state

Debugging Tools

# Thread-safe debugging output
using Base.Threads: SpinLock
debug_lock = SpinLock()
function safe_debug(msg)
    lock(debug_lock) do
        println("Thread $(threadid()): $msg")
    end
end

For complete implementation examples, see the Multi-Threading Tutorial.