Performance Tuning & Optimization¶
How to achieve 2-3x FPS improvements and support thousands of particles.
Phase 1: Trail Rendering Optimization¶
The most impactful optimization for supporting 1k-2k particles.
Problem¶
- At 1k particles with default settings:
Trail buffer: 1k particles × 400 segments/particle × 2 endpoints = 800k vertices/frame
GPU memory: Each vertex = ~12 bytes → ~10MB overhead per frame
CPU overhead: Building trail mesh from ring buffers
Result: 3-5 FPS (unacceptable for interactive simulation)
Solution: Phase 1 (Implemented)¶
Trail segment reduction:
Default:
TRAIL_LENGTH = 40(reduced from 400)Vertex reduction: 1k × 40 × 2 = 80k vertices/frame (10x fewer!)
Performance gain: 2-3x FPS at 1k particles (→ 8-15 FPS)
Visual quality: 40 segments still smooth; only ~10-unit motion visible
Skip logic in kernel (hardcoded for performance):
Skip photons (decay <1e-20s; trails are clutter added by 20% fewer renders)
Skip frozen/pinned particles (no motion → no trail purpose; ~10% reduction)
Skip slow movers (speed < 0.1; ~30% reduction for typical setup)
Skip short trails (< 3 segments; ~5% reduction)
Combined effect:
Vertex count: ~800k → ~80k (10x)
Added filtering: ~40-50% fewer particles draw trails
Final vertex count: ~40-50k vertices/frame
FPS improvement: 2-3x (3-5 FPS → 8-15 FPS at 1k particles)
Recommended Configurations¶
For 1k particles @ 60 FPS target:
TRAIL_LENGTH = 40
MIN_TRAIL_SPEED_FOR_RENDER = 0.1
MIN_TRAIL_LENGTH_FOR_RENDER = 3
SUBSTEPS = 3 # Reduce if CPU-limited
For 2k particles @ 30 FPS target:
TRAIL_LENGTH = 20
MIN_TRAIL_SPEED_FOR_RENDER = 0.2
MIN_TRAIL_LENGTH_FOR_RENDER = 3
SUBSTEPS = 2 # Lower for more GPU headroom
For 5k+ particles (points-only mode):
TRAIL_LENGTH = 5 # Minimal
MIN_TRAIL_SPEED_FOR_RENDER = 0.5 # Only fast particles
TRAILS_ENABLED_DEFAULT = False # Disable by default (press T to enable)
SUBSTEPS = 1
Configuration Parameters¶
All tunable parameters are in config.py. Modify and restart to apply:
TRAIL_LENGTH (default: 40)
TRAIL_LENGTH = 40
Ring buffer size per particle
Impact: Direct linear impact on vertex count and memory
Range: 5-100
Performance: Each +40 segments = ~+10% GPU time at 1k particles
Visual: 5-10 = sparse, 20-40 = moderate, 50+ = dense
Recommendation: Use 40 for smooth appearance, 20 for dense scenarios
MIN_TRAIL_SPEED_FOR_RENDER (default: 0.1)
MIN_TRAIL_SPEED_FOR_RENDER = 0.1
Skip rendering trails for particles with |**v**| < this threshold
Impact: ~20-30% fewer rendered trails (particle-dependent)
Range: 0.0 (all trails) → 1.0 (only very fast)
Physics: Unit-less, relative to typical velocities
Recommendation: 0.1 for exploration, 0.2-0.5 for high particle count
MIN_TRAIL_LENGTH_FOR_RENDER (default: 3)
MIN_TRAIL_LENGTH_FOR_RENDER = 3
Don’t render trails with fewer than N segments
Impact: ~5% (minor)
Range: 2-5 typical
Purpose: Hide incomplete/stub trails from newly spawned particles
Recommendation: Keep at 3
TRAILS_ENABLED_DEFAULT (default: True)
TRAILS_ENABLED_DEFAULT = True
Start with trails enabled/disabled
Runtime toggle: Press T key anytime
Impact: Disabling saves ~30% GPU time in trail rendering
Recommendation: True for exploration, False for dense scenarios
Hardcoded Skip Conditions (Kernel-level)¶
The following are currently hardcoded in the GPU kernel for maximum performance (no runtime branch cost):
Skip photons in trails:
if ptype[i] == PHOTON:
continue # Don't render photon trails
Rationale: Photons decay in ~1e-20 seconds; trails are visual clutter
Impact: ~20% fewer trails
To make configurable: Add
skip_photonsparameter to kernel signatureLocation:
simulation.pyline 1391
Skip frozen particles in trails:
if frozen[i] == 1:
continue # Don't render trails for pinned particles
Rationale: Frozen particles don’t move; no motion to visualize
Impact: ~10% fewer trails
To make configurable: Add
skip_frozenparameter to kernel signatureLocation:
simulation.pyline 1394
Other Optimization Opportunities¶
Phase 2 (Not yet implemented):
GPU kernel consolidation — Merge build_render_data() and build_trail_lines() kernels
Rationale: Reduces GPU kernel launch overhead, single pass through particles
Benefit: ~10-15% overhead reduction
Status: Planned, estimated effort = 2-3 hours
Phase 3 (Future):
Adaptive trail density — Distance and speed-based sampling
Rationale: Don’t store every frame of trail; skip based on motion magnitude
Benefit: Additional ~20-30% vertex reduction
Tradeoff: Slightly lower visual smoothness at extremes
Phase 4 (Future):
GPU particle sorting for LOD — Sort by distance to camera; cull distant particles
Rationale: Distant particles contribute <1% visual, use GPU time
Benefit: ~10-20% GPU time for large scenes (5k+ particles)
Tradeoff: Additional CPU overhead for sorting each frame
Resolution & Rendering¶
Window resolution directly affects GPU load:
1920×1080: Standard; ~60% of 2560×1600 bandwidth
2560×1600: Default; ~1 GPU frame per output frame
3840×2160 (4K): 2.25x bandwidth; expect 30-40% FPS hit
To tune:
WINDOW_WIDTH = 1920 # From default 2560
WINDOW_HEIGHT = 1200 # From default 1600
Impact: Lower resolution = proportionally higher FPS
Visual quality: 1920×1200 acceptable for most uses
Physics Timestep Tuning¶
DT (base timestep) — Smaller = more accurate but slower
DT = 0.001 # Default
Range: 0.0001 (very accurate, slow) → 0.02 (fast, less accurate)
Impact: ~1% FPS per 2x change
Recommendation: Keep at 0.001 for realism; use 0.002 for speed
SUBSTEPS — Multiple physics steps per frame
SUBSTEPS = 4 # Default
Range: 1 (fast, jittery) → 10 (smooth, slow)
Impact: ~linear (2x substeps = 2x physics time)
Recommendation: 4-5 for smooth motion; reduce to 2 if GPU-limited
Combined physics cost:
DT + SUBSTEPS determine total CPU time per frame
If FPS ≥ 30 and CPU ≤ 50%, increase substeps for smoother motion
Particle Rendering¶
BASE_PARTICLE_RADIUS — Size of particles in 3D view
BASE_PARTICLE_RADIUS = 0.12
Range: 0.02 (tiny dots) → 0.3 (large spheres)
Impact: Negligible on FPS (spheres are simple geometry)
Visual: Adjust for clarity
PARTICLE_RADIUS_SCALE — Relative size scaling
PARTICLE_RADIUS_SCALE = 0.45
Applies to PDG-table per-type radii
Visual: 0.5 = all particles same size, 1.0 = emphasize mass differences
Measuring Performance¶
In-window FPS display:
ImGui left panel shows real-time FPS
Also shows Frame Time (ms)
Python benchmarking:
import time
from quantum_collider_sandbox.simulation import Simulation
sim = Simulation(preset="default", particles=1000)
times = []
for step in range(100):
start = time.time()
sim.step()
times.append(time.time() - start)
avg_frame_ms = sum(times) * 1000 / len(times)
fps = 1000 / avg_frame_ms
print(f"Average FPS: {fps:.1f} ({avg_frame_ms:.2f} ms/frame)")
Performance Testing Checklist¶
[ ] Baseline FPS at 1k particles (default config)
[ ] FPS with TRAIL_LENGTH = 20 (Phase 1)
[ ] FPS with TRAIL_LENGTH = 5 (heavy optimization)
[ ] FPS with trails disabled (T key)
[ ] FPS at 2k particles (Phase 1)
[ ] CPU vs GPU bottleneck (check process monitor)
[ ] Memory usage (monitor VRAM)
[ ] No visual artifacts (trails smooth, no tearing)
Troubleshooting Low FPS¶
If FPS < 10 at 100 particles:
Check GPU is being used:
nvidia-smi(NVIDIA) orrocm-smi(AMD)Try:
export TAICHI_BACKEND=cputo test CPU backendIf CPU faster, GPU drivers may be broken
If FPS < 30 at 1k particles (after Phase 1):
Reduce TRAIL_LENGTH to 10
Increase MIN_TRAIL_SPEED_FOR_RENDER to 0.2
Reduce WINDOW_WIDTH/HEIGHT by 25%
Reduce SUBSTEPS to 2
If FPS drops when moving camera:
GPU is likely bottleneck (not physics)
Reduce window resolution
Reduce TRAIL_LENGTH
Disable Trails (T key)
If FPS inconsistent (stuttering):
Reduce SUBSTEPS (fewer CPU-GPU sync points)
Check for background processes
Verify GPU drivers are up-to-date
Summary: Phase 1 Benefits¶
See Configuration for all tunable parameters.