Performance Tuning & Optimization¶

How to achieve 2-3x FPS improvements and support thousands of particles.

Phase 1: Trail Rendering Optimization¶

The most impactful optimization for supporting 1k-2k particles.

Problem¶

At 1k particles with default settings:

Trail buffer: 1k particles × 400 segments/particle × 2 endpoints = 800k vertices/frame
GPU memory: Each vertex = ~12 bytes → ~10MB overhead per frame
CPU overhead: Building trail mesh from ring buffers
Result: 3-5 FPS (unacceptable for interactive simulation)

Solution: Phase 1 (Implemented)¶

Trail segment reduction:

Default: TRAIL_LENGTH = 40 (reduced from 400)
Vertex reduction: 1k × 40 × 2 = 80k vertices/frame (10x fewer!)
Performance gain: 2-3x FPS at 1k particles (→ 8-15 FPS)
Visual quality: 40 segments still smooth; only ~10-unit motion visible

Skip logic in kernel (hardcoded for performance):

Skip photons (decay <1e-20s; trails are clutter added by 20% fewer renders)
Skip frozen/pinned particles (no motion → no trail purpose; ~10% reduction)
Skip slow movers (speed < 0.1; ~30% reduction for typical setup)
Skip short trails (< 3 segments; ~5% reduction)

Combined effect:

Vertex count: ~800k → ~80k (10x)
Added filtering: ~40-50% fewer particles draw trails
Final vertex count: ~40-50k vertices/frame
FPS improvement: 2-3x (3-5 FPS → 8-15 FPS at 1k particles)

Recommended Configurations¶

For 1k particles @ 60 FPS target:

TRAIL_LENGTH = 40
MIN_TRAIL_SPEED_FOR_RENDER = 0.1
MIN_TRAIL_LENGTH_FOR_RENDER = 3
SUBSTEPS = 3  # Reduce if CPU-limited

For 2k particles @ 30 FPS target:

TRAIL_LENGTH = 20
MIN_TRAIL_SPEED_FOR_RENDER = 0.2
MIN_TRAIL_LENGTH_FOR_RENDER = 3
SUBSTEPS = 2  # Lower for more GPU headroom

For 5k+ particles (points-only mode):

TRAIL_LENGTH = 5  # Minimal
MIN_TRAIL_SPEED_FOR_RENDER = 0.5  # Only fast particles
TRAILS_ENABLED_DEFAULT = False  # Disable by default (press T to enable)
SUBSTEPS = 1

Configuration Parameters¶

All tunable parameters are in config.py. Modify and restart to apply:

TRAIL_LENGTH (default: 40)

TRAIL_LENGTH = 40

Ring buffer size per particle
Impact: Direct linear impact on vertex count and memory
Range: 5-100
Performance: Each +40 segments = ~+10% GPU time at 1k particles
Visual: 5-10 = sparse, 20-40 = moderate, 50+ = dense
Recommendation: Use 40 for smooth appearance, 20 for dense scenarios

MIN_TRAIL_SPEED_FOR_RENDER (default: 0.1)

MIN_TRAIL_SPEED_FOR_RENDER = 0.1

Skip rendering trails for particles with |**v**| < this threshold
Impact: ~20-30% fewer rendered trails (particle-dependent)
Range: 0.0 (all trails) → 1.0 (only very fast)
Physics: Unit-less, relative to typical velocities
Recommendation: 0.1 for exploration, 0.2-0.5 for high particle count

MIN_TRAIL_LENGTH_FOR_RENDER (default: 3)

MIN_TRAIL_LENGTH_FOR_RENDER = 3

Don’t render trails with fewer than N segments
Impact: ~5% (minor)
Range: 2-5 typical
Purpose: Hide incomplete/stub trails from newly spawned particles
Recommendation: Keep at 3

TRAILS_ENABLED_DEFAULT (default: True)

TRAILS_ENABLED_DEFAULT = True

Start with trails enabled/disabled
Runtime toggle: Press T key anytime
Impact: Disabling saves ~30% GPU time in trail rendering
Recommendation: True for exploration, False for dense scenarios

Hardcoded Skip Conditions (Kernel-level)¶

The following are currently hardcoded in the GPU kernel for maximum performance (no runtime branch cost):

Skip photons in trails:

if ptype[i] == PHOTON:
    continue  # Don't render photon trails

Rationale: Photons decay in ~1e-20 seconds; trails are visual clutter
Impact: ~20% fewer trails
To make configurable: Add skip_photons parameter to kernel signature
Location: simulation.py line 1391

Skip frozen particles in trails:

if frozen[i] == 1:
    continue  # Don't render trails for pinned particles

Rationale: Frozen particles don’t move; no motion to visualize
Impact: ~10% fewer trails
To make configurable: Add skip_frozen parameter to kernel signature
Location: simulation.py line 1394

Other Optimization Opportunities¶

Phase 2 (Not yet implemented):

GPU kernel consolidation — Merge build_render_data() and build_trail_lines() kernels

Rationale: Reduces GPU kernel launch overhead, single pass through particles
Benefit: ~10-15% overhead reduction
Status: Planned, estimated effort = 2-3 hours

Phase 3 (Future):

Adaptive trail density — Distance and speed-based sampling

Rationale: Don’t store every frame of trail; skip based on motion magnitude
Benefit: Additional ~20-30% vertex reduction
Tradeoff: Slightly lower visual smoothness at extremes

Phase 4 (Future):

GPU particle sorting for LOD — Sort by distance to camera; cull distant particles

Rationale: Distant particles contribute <1% visual, use GPU time
Benefit: ~10-20% GPU time for large scenes (5k+ particles)
Tradeoff: Additional CPU overhead for sorting each frame

Resolution & Rendering¶

Window resolution directly affects GPU load:

1920×1080: Standard; ~60% of 2560×1600 bandwidth
2560×1600: Default; ~1 GPU frame per output frame
3840×2160 (4K): 2.25x bandwidth; expect 30-40% FPS hit

To tune:

WINDOW_WIDTH = 1920  # From default 2560
WINDOW_HEIGHT = 1200  # From default 1600

Impact: Lower resolution = proportionally higher FPS
Visual quality: 1920×1200 acceptable for most uses

Physics Timestep Tuning¶

DT (base timestep) — Smaller = more accurate but slower

DT = 0.001  # Default

Range: 0.0001 (very accurate, slow) → 0.02 (fast, less accurate)
Impact: ~1% FPS per 2x change
Recommendation: Keep at 0.001 for realism; use 0.002 for speed

SUBSTEPS — Multiple physics steps per frame

SUBSTEPS = 4  # Default

Range: 1 (fast, jittery) → 10 (smooth, slow)
Impact: ~linear (2x substeps = 2x physics time)
Recommendation: 4-5 for smooth motion; reduce to 2 if GPU-limited

Combined physics cost:

DT + SUBSTEPS determine total CPU time per frame
If FPS ≥ 30 and CPU ≤ 50%, increase substeps for smoother motion

Particle Rendering¶

BASE_PARTICLE_RADIUS — Size of particles in 3D view

BASE_PARTICLE_RADIUS = 0.12

Range: 0.02 (tiny dots) → 0.3 (large spheres)
Impact: Negligible on FPS (spheres are simple geometry)
Visual: Adjust for clarity

PARTICLE_RADIUS_SCALE — Relative size scaling

PARTICLE_RADIUS_SCALE = 0.45

Applies to PDG-table per-type radii
Visual: 0.5 = all particles same size, 1.0 = emphasize mass differences

Measuring Performance¶

In-window FPS display:

ImGui left panel shows real-time FPS
Also shows Frame Time (ms)

Python benchmarking:

import time
from quantum_collider_sandbox.simulation import Simulation

sim = Simulation(preset="default", particles=1000)

times = []
for step in range(100):
    start = time.time()
    sim.step()
    times.append(time.time() - start)

avg_frame_ms = sum(times) * 1000 / len(times)
fps = 1000 / avg_frame_ms
print(f"Average FPS: {fps:.1f} ({avg_frame_ms:.2f} ms/frame)")

Performance Testing Checklist¶

[ ] Baseline FPS at 1k particles (default config)
[ ] FPS with TRAIL_LENGTH = 20 (Phase 1)
[ ] FPS with TRAIL_LENGTH = 5 (heavy optimization)
[ ] FPS with trails disabled (T key)
[ ] FPS at 2k particles (Phase 1)
[ ] CPU vs GPU bottleneck (check process monitor)
[ ] Memory usage (monitor VRAM)
[ ] No visual artifacts (trails smooth, no tearing)

Troubleshooting Low FPS¶

If FPS < 10 at 100 particles:

Check GPU is being used: nvidia-smi (NVIDIA) or rocm-smi (AMD)
Try: export TAICHI_BACKEND=cpu to test CPU backend
If CPU faster, GPU drivers may be broken

If FPS < 30 at 1k particles (after Phase 1):

Reduce TRAIL_LENGTH to 10
Increase MIN_TRAIL_SPEED_FOR_RENDER to 0.2
Reduce WINDOW_WIDTH/HEIGHT by 25%
Reduce SUBSTEPS to 2

If FPS drops when moving camera:

GPU is likely bottleneck (not physics)
Reduce window resolution
Reduce TRAIL_LENGTH
Disable Trails (T key)

If FPS inconsistent (stuttering):

Reduce SUBSTEPS (fewer CPU-GPU sync points)
Check for background processes
Verify GPU drivers are up-to-date

Summary: Phase 1 Benefits¶

See Configuration for all tunable parameters.