Performance Tuning & Optimization

How to achieve 2-3x FPS improvements and support thousands of particles.

Phase 1: Trail Rendering Optimization

The most impactful optimization for supporting 1k-2k particles.

Problem

At 1k particles with default settings:
  • Trail buffer: 1k particles × 400 segments/particle × 2 endpoints = 800k vertices/frame

  • GPU memory: Each vertex = ~12 bytes → ~10MB overhead per frame

  • CPU overhead: Building trail mesh from ring buffers

  • Result: 3-5 FPS (unacceptable for interactive simulation)

Solution: Phase 1 (Implemented)

Trail segment reduction:

  • Default: TRAIL_LENGTH = 40 (reduced from 400)

  • Vertex reduction: 1k × 40 × 2 = 80k vertices/frame (10x fewer!)

  • Performance gain: 2-3x FPS at 1k particles (→ 8-15 FPS)

  • Visual quality: 40 segments still smooth; only ~10-unit motion visible

Skip logic in kernel (hardcoded for performance):

  1. Skip photons (decay <1e-20s; trails are clutter added by 20% fewer renders)

  2. Skip frozen/pinned particles (no motion → no trail purpose; ~10% reduction)

  3. Skip slow movers (speed < 0.1; ~30% reduction for typical setup)

  4. Skip short trails (< 3 segments; ~5% reduction)

Combined effect:

  • Vertex count: ~800k → ~80k (10x)

  • Added filtering: ~40-50% fewer particles draw trails

  • Final vertex count: ~40-50k vertices/frame

  • FPS improvement: 2-3x (3-5 FPS → 8-15 FPS at 1k particles)

Configuration Parameters

All tunable parameters are in config.py. Modify and restart to apply:

TRAIL_LENGTH (default: 40)

TRAIL_LENGTH = 40
  • Ring buffer size per particle

  • Impact: Direct linear impact on vertex count and memory

  • Range: 5-100

  • Performance: Each +40 segments = ~+10% GPU time at 1k particles

  • Visual: 5-10 = sparse, 20-40 = moderate, 50+ = dense

  • Recommendation: Use 40 for smooth appearance, 20 for dense scenarios

MIN_TRAIL_SPEED_FOR_RENDER (default: 0.1)

MIN_TRAIL_SPEED_FOR_RENDER = 0.1
  • Skip rendering trails for particles with |**v**| < this threshold

  • Impact: ~20-30% fewer rendered trails (particle-dependent)

  • Range: 0.0 (all trails) → 1.0 (only very fast)

  • Physics: Unit-less, relative to typical velocities

  • Recommendation: 0.1 for exploration, 0.2-0.5 for high particle count

MIN_TRAIL_LENGTH_FOR_RENDER (default: 3)

MIN_TRAIL_LENGTH_FOR_RENDER = 3
  • Don’t render trails with fewer than N segments

  • Impact: ~5% (minor)

  • Range: 2-5 typical

  • Purpose: Hide incomplete/stub trails from newly spawned particles

  • Recommendation: Keep at 3

TRAILS_ENABLED_DEFAULT (default: True)

TRAILS_ENABLED_DEFAULT = True
  • Start with trails enabled/disabled

  • Runtime toggle: Press T key anytime

  • Impact: Disabling saves ~30% GPU time in trail rendering

  • Recommendation: True for exploration, False for dense scenarios

Hardcoded Skip Conditions (Kernel-level)

The following are currently hardcoded in the GPU kernel for maximum performance (no runtime branch cost):

Skip photons in trails:

if ptype[i] == PHOTON:
    continue  # Don't render photon trails
  • Rationale: Photons decay in ~1e-20 seconds; trails are visual clutter

  • Impact: ~20% fewer trails

  • To make configurable: Add skip_photons parameter to kernel signature

  • Location: simulation.py line 1391

Skip frozen particles in trails:

if frozen[i] == 1:
    continue  # Don't render trails for pinned particles
  • Rationale: Frozen particles don’t move; no motion to visualize

  • Impact: ~10% fewer trails

  • To make configurable: Add skip_frozen parameter to kernel signature

  • Location: simulation.py line 1394

Other Optimization Opportunities

Phase 2 (Not yet implemented):

GPU kernel consolidation — Merge build_render_data() and build_trail_lines() kernels

  • Rationale: Reduces GPU kernel launch overhead, single pass through particles

  • Benefit: ~10-15% overhead reduction

  • Status: Planned, estimated effort = 2-3 hours

Phase 3 (Future):

Adaptive trail density — Distance and speed-based sampling

  • Rationale: Don’t store every frame of trail; skip based on motion magnitude

  • Benefit: Additional ~20-30% vertex reduction

  • Tradeoff: Slightly lower visual smoothness at extremes

Phase 4 (Future):

GPU particle sorting for LOD — Sort by distance to camera; cull distant particles

  • Rationale: Distant particles contribute <1% visual, use GPU time

  • Benefit: ~10-20% GPU time for large scenes (5k+ particles)

  • Tradeoff: Additional CPU overhead for sorting each frame

Resolution & Rendering

Window resolution directly affects GPU load:

  • 1920×1080: Standard; ~60% of 2560×1600 bandwidth

  • 2560×1600: Default; ~1 GPU frame per output frame

  • 3840×2160 (4K): 2.25x bandwidth; expect 30-40% FPS hit

To tune:

WINDOW_WIDTH = 1920  # From default 2560
WINDOW_HEIGHT = 1200  # From default 1600
  • Impact: Lower resolution = proportionally higher FPS

  • Visual quality: 1920×1200 acceptable for most uses

Physics Timestep Tuning

DT (base timestep) — Smaller = more accurate but slower

DT = 0.001  # Default
  • Range: 0.0001 (very accurate, slow) → 0.02 (fast, less accurate)

  • Impact: ~1% FPS per 2x change

  • Recommendation: Keep at 0.001 for realism; use 0.002 for speed

SUBSTEPS — Multiple physics steps per frame

SUBSTEPS = 4  # Default
  • Range: 1 (fast, jittery) → 10 (smooth, slow)

  • Impact: ~linear (2x substeps = 2x physics time)

  • Recommendation: 4-5 for smooth motion; reduce to 2 if GPU-limited

Combined physics cost:

  • DT + SUBSTEPS determine total CPU time per frame

  • If FPS ≥ 30 and CPU ≤ 50%, increase substeps for smoother motion

Particle Rendering

BASE_PARTICLE_RADIUS — Size of particles in 3D view

BASE_PARTICLE_RADIUS = 0.12
  • Range: 0.02 (tiny dots) → 0.3 (large spheres)

  • Impact: Negligible on FPS (spheres are simple geometry)

  • Visual: Adjust for clarity

PARTICLE_RADIUS_SCALE — Relative size scaling

PARTICLE_RADIUS_SCALE = 0.45
  • Applies to PDG-table per-type radii

  • Visual: 0.5 = all particles same size, 1.0 = emphasize mass differences

Measuring Performance

In-window FPS display:

  • ImGui left panel shows real-time FPS

  • Also shows Frame Time (ms)

Python benchmarking:

import time
from quantum_collider_sandbox.simulation import Simulation

sim = Simulation(preset="default", particles=1000)

times = []
for step in range(100):
    start = time.time()
    sim.step()
    times.append(time.time() - start)

avg_frame_ms = sum(times) * 1000 / len(times)
fps = 1000 / avg_frame_ms
print(f"Average FPS: {fps:.1f} ({avg_frame_ms:.2f} ms/frame)")

Performance Testing Checklist

  • [ ] Baseline FPS at 1k particles (default config)

  • [ ] FPS with TRAIL_LENGTH = 20 (Phase 1)

  • [ ] FPS with TRAIL_LENGTH = 5 (heavy optimization)

  • [ ] FPS with trails disabled (T key)

  • [ ] FPS at 2k particles (Phase 1)

  • [ ] CPU vs GPU bottleneck (check process monitor)

  • [ ] Memory usage (monitor VRAM)

  • [ ] No visual artifacts (trails smooth, no tearing)

Troubleshooting Low FPS

If FPS < 10 at 100 particles:

  1. Check GPU is being used: nvidia-smi (NVIDIA) or rocm-smi (AMD)

  2. Try: export TAICHI_BACKEND=cpu to test CPU backend

  3. If CPU faster, GPU drivers may be broken

If FPS < 30 at 1k particles (after Phase 1):

  1. Reduce TRAIL_LENGTH to 10

  2. Increase MIN_TRAIL_SPEED_FOR_RENDER to 0.2

  3. Reduce WINDOW_WIDTH/HEIGHT by 25%

  4. Reduce SUBSTEPS to 2

If FPS drops when moving camera:

  1. GPU is likely bottleneck (not physics)

  2. Reduce window resolution

  3. Reduce TRAIL_LENGTH

  4. Disable Trails (T key)

If FPS inconsistent (stuttering):

  1. Reduce SUBSTEPS (fewer CPU-GPU sync points)

  2. Check for background processes

  3. Verify GPU drivers are up-to-date

Summary: Phase 1 Benefits

See Configuration for all tunable parameters.