Architecture & Design¶

System architecture, data flow, and design patterns.

System Overview¶

Module Responsibilities¶

__main__.py (CLI & Main Loop)

Entry point and command-line argument parsing
Simulation loop (render each frame, step physics)
Input handling (keyboard, mouse, ImGui)
State management (pause/resume, reset)

config.py (Constants)

Physics constants (forces, integrators, relativity)
Rendering constants (camera, trails, colors)
Preset definitions
Size: ~400 lines

simulation.py (GPU Physics)

Taichi @ti.kernel GPU kernels
Force computation (Coulomb, gravity, Lorentz, strong)
Particle dynamics (integrate, boundary, collision detection)
Decay & pair production
Trail rendering (Phase 1 optimized)
Size: ~2900 lines (largest module)

renderer.py (Visualization)

Taichi GUI window management
3D camera and scene setup
Particle rendering (points/spheres + trails)
ImGui control panels
Black hole effects (disk, photon ring, lensing)
Collision flashes and debug overlays

particles.py (Taichi Field Management)

Initializes Taichi fields (big arrays on GPU)
Position, velocity, type buffers
Trail ring buffers (one per particle)

pdg_table.py (Particle Catalog)

40 particles from Particle Data Group
Masses, lifetimes, charges, decay channels
Color mappings for visualization
Quantum numbers (flavor, color, spin)

data_loader.py (I/O)

HDF5 state import/export
CSV event logging
JSONL physics events

Data Flow (Per Frame)¶

Input Phase

Keyboard/Mouse Events
       ↓
Renderer.handle_input()
       ↓
ImGui Sliders → Update force constants in real-time
SPACE → Toggle pause
+/- → Spawn/remove particles
Ctrl+S → Save state to HDF5

Physics Phase (repeated SUBSTEPS times)

compute_forces()
    ↓
    Coulomb (pairwise, O(N²))
    Gravity (pairwise, O(N²))
    Strong force (baryon-baryon, O(N²))
    Lorentz E/B (O(N))
    Black hole gravity (O(N))
    ↓
_integrate_step()
    ↓
    Leapfrog half-kick OR Euler step
    Velocity clamping (MAX_VELOCITY)
    Relativity correction (γ)
    ↓
apply_boundaries()
    ↓
    Reflect or periodic wrapping
    ↓
detect_collisions()
    ↓
    O(N²) pairwise radius overlap
    Dispatch: annihilation, decay, elastic scatter
    Enqueue spawn products
    ↓
monte_carlo_decay()
    ↓
    Exponential decay law
    Time dilation (SR + GR)
    Enqueue spawn products
    ↓
_apply_spawn_queue() + _finalize_spawn()
    ↓
    Create new particles from collisions/decays
    ↓
record_trails()
    ↓
    Ring buffer update (latest position to head)
    ↓
(Leapfrog only) compute_forces() + half-kick

Maintenance Phase

do_maintenance()
    ↓
    Fade collision flashes
    Compact dead particles
    Update active particle count

Stats Phase (every 10 frames)

refresh_stats()
    ↓
    Compute KE, momentum, census
    Update ImGui display

Render Phase

prepare_render()
    ↓
    build_render_data() kernel → GPU buffer
    build_trail_lines() kernel → GPU vertices (Phase 1 optimized)
    GPU → CPU (vertex/color uploads)
    ↓
Renderer.render()
    ↓
    Draw particles (colored spheres)
    Draw trails (line segments with fade)
    Draw black hole effects
    Draw collision flashes
    Draw starfield
    Draw ImGui panels (CPU-side)
    ↓
taichi.ui.show()
    ↓
    Display to screen

Taichi GPU Kernels¶

@ti.kernel decorator: JIT-compiled GPU code

compute_forces() [O(N²)]

Nested loop over all particles

Pairwise distance & force calculation

Accumulate force vector per particle

GPU: Each thread processes one particle

_integrate_step() [O(N)]

Loop over all particles

Update velocity (force/mass acceleration)

Update position (velocity × dt)

Clamp to MAX_VELOCITY

Apply SR gamma correction

detect_collisions() [O(N²)]

Nested loop over particle pairs

Sphere-sphere overlap test

If collision → spawn products (enqueued)

Dispatch collision type (annihilation, decay, scatter)

monte_carlo_decay() [O(N)]

Loop over particles

Compute decay probability (exponential law)

If decay occurs → randomly select channel, spawn products

build_trail_lines() [O(N)]

Per-particle: extract ring buffer positions

Phase 1 skip logic: - Skip if type == PHOTON - Skip if frozen - Skip if speed < MIN_TRAIL_SPEED_FOR_RENDER - Skip if trail too short

Generate line segment vertices for rendering

Write to GPU arrays (trail_vertices, trail_colors)

All kernels are memory-coalesced (sequential GPU thread access to arrays).

State Management¶

Active Particle List:

num_active[None] — Count of live particles (compacted list)
Particles are indexed [0, num_active)
Dead particles moved to end, count decremented
Avoids fragmentation

Ring Buffer Trails:

Per-particle: trail_pos[i, :] = ring buffer of 40/20/5 positions
trail_head[i] = index of newest position (wraps 0 to TRAIL_LENGTH-1)
trail_count[i] = number of valid positions (starts at 1, fills up to TRAIL_LENGTH)

Spawn Queue:

Temporary list of particles to create (from collisions/decays)
Processed end-of-step (after collision detection)
New particles inserted into main structure

Force State:

Current force constants (read from config each frame)
Derived fields (magnetic field magnitude, velocity-dependent damping)

Integrators¶

Euler (first-order):

\[\begin{split}\vec{v}_{n+1} &= \vec{v}_n + \frac{\vec{F}}{m} \cdot dt \\ \vec{r}_{n+1} &= \vec{r}_n + \vec{v}_{n+1} \cdot dt\end{split}\]

Simple, fast
Poor energy conservation
Energy drifts monotonically

Leapfrog (symplectic, second-order):

\[\begin{split}\vec{v}_{n+1/2} &= \vec{v}_n + \frac{\vec{F}}{m} \cdot \frac{dt}{2} \\ \vec{r}_{n+1} &= \vec{r}_n + \vec{v}_{n+1/2} \cdot dt \\ \vec{v}_{n+1} &= \vec{v}_{n+1/2} + \frac{\vec{F}}{m} \cdot \frac{dt}{2}\end{split}\]

Symplectic → preserves phase-space volume
Better energy conservation
Slightly higher computational cost
Recommended for long simulations

Particle Lifecycle¶

Spawn — Random position/velocity or from decay/collision
Alive — Part of main particle pool, physics applied
Collision → New particles spawned (or annihilation)
Decay → New particles spawned, original removed
Dead — Removed from active pool (compacted)

Removal causes: - Natural decay (exponential lifetime) - Boundary escape (if BOUNDARY_MODE=”none”) - Annihilation in collision

GPU Memory Layout¶

Taichi allocates large fixed arrays on GPU at startup:

Field	Size	Purpose
`pos[MAX_PARTICLES]`	100 particles × 3 floats = 1.2 KB	Current positions
`vel[MAX_PARTICLES]`	100 particles × 3 floats = 1.2 KB	Current velocities
`ptype[MAX_PARTICLES]`	100 particles × 1 int = 400 B	Particle type (PDG ID)
`trail_pos[MAX_PARTICLES, TRAIL_LENGTH]`	100 × 40 × 3 floats = 48 KB	Trail history (ring buffer)
`trail_vertices[MAX_PARTICLES * TRAIL_LENGTH * 2]`	100 × 40 × 2 × 3 floats = 96 KB	Trail geometry (GPU-side only)
`trail_colors[MAX_PARTICLES * TRAIL_LENGTH * 2]`	100 × 40 × 2 × 3 floats = 96 KB	Trail colors (GPU-side only)

Total: ~250 KB (negligible on modern GPUs)

Phase 1 optimization reduces trail vertex buffer by 10x (same field size, fewer rendered vertices).

Design Patterns¶

Kernel + Taichi Fields:

All heavy computation in @ti.kernel functions
Fields auto-parallelized across GPU threads
No explicit parallelism in Python code

Configuration as Constants:

All tunable parameters in config.py
Imported at module load time
Changes require restart

Event-Driven Physics:

Collisions and decays detected at step
Products enqueued, applied end-of-step
Prevents iterator invalidation in nested loops

Compacted Active List:

Dead particles not physically removed
Active count decremented, dead moved to end
Prevents fragmentation and repeated allocation

Error Handling¶

Physics assertions (disabled in release):

Energy bounds checking
NaN/Inf detection in forces
Particle count sanity checks

Boundary checks:

Velocity clamping (prevent overflow)
Force magnitude clamping in collider
Array index bounds (in Taichi kernel)

Graceful degradation:

If GPU out of memory → fallback to CPU (if Taichi supports)
If particle spawn fails → log warning, continue
If export fails → user gets error box, sim continues

Performance Considerations¶

GPU Memory Coalescing:

Kernel loops iterate particles sequentially
Threads access same field sequentially
GPU caches efficiently (coalesced memory access)

Reduced Branching:

Hardcoded trail skip conditions (photons, frozen) for perf
No runtime config branches in inner loops

Phase 1 Optimization:

10x vertex reduction (400 → 40 segments)
Skip logic pre-filters particles before rendering
Result: 2-3x FPS improvement at 1k particles

See Performance Tuning for more optimization details.