A six-level research playground for quantum many-body physics — statevector, matrix product states, kernel fusion, chemistry, phase transitions, and real-time quench dynamics — all running on WebGPU, all written in TypeScript, all externally validated against ITensor (the canonical condensed-matter research tool).
Three URLs. Open any of them — no install, no Linux, no Python. Just a tab.
Three synchronized 3D panels: H₂ electron density, conditional pair density (with a draggable cursor finding the Fermi/Coulomb hole), and a live MPS bond-network with phase-transition slider and quench-dynamics light cone.
Run the full E1–E16 experiment ladder live: gate fidelity, dispatch roofline, MPS correctness, kernel-fusion benchmarks, VQE on H₂ dissociation. Every run produces a JSON artifact with environment capture.
The original benchmark page: a handful of textbook circuits (Bell, GHZ, QFT, Deutsch-Jozsa) running on the GPU with CPU cross-check, gate-rate measurement, and bandwidth roofline.
Each level builds on the last. Start at the GPU statevector and climb through tensor networks, kernel fusion, and quantum chemistry.
2^N complex amplitudes on GPU storage buffers. f32 single-qubit and controlled-U kernels, dispatch overhead α ≈ 22 μs on Apple Metal-3.
f64 Jacobi complex-SVD + canonical-form sweeps. Validated against ITensor DMRG to ≤ 5 mHa on N=8 chains.
JIT-emitted WGSL chains plus 4×4 brick-wall, 8×8 cascade, and 16×16 quad-cascade tile fusion. 4.18× at the 8×8 sweet spot (Tier C, N=15); 16×16 (Tier D) plateaus at 3.14× — an honest negative as the wider tile crosses into compute-bound territory.
STO-3G H₂ from molecular integrals (Boys, Gaussians, JW). FCI matches PySCF to 7 decimals. Full dissociation curve hits chemical accuracy 50/50 trials.
webgpu-q is one front of a broader research line on GPU-resident compute in the browser. The umbrella site, the benchmark infrastructure, and the sister demos:
The research umbrella. Single-kernel fusion for GPU workloads — evolutionary search, transformer decoding, and browser-to-browser distributed evolution. Up to 2,865× speedups on Apple Silicon by collapsing per-dispatch overhead.
The sister project. Geant4-DNA (Monte Carlo electron track structure + IRT radiolysis chemistry + DNA damage scoring) in the browser. CSDA 0.985× of reference, 46/46 tests pass. Where the splat shader and the “3D field + time = 4D viewer” pattern in our hyperscope come from.
“How fast is your GPU in the browser?” Real WebGPU compute tests across 592 devices, 7 vendors — full transparency, no cherry-picking. Run it before you trust the numbers on this page.
Phi-3-mini (3.8B parameters) running entirely in the browser via 10 hand-written WebGPU kernel roles across 27 WGSL files, replacing the 85 TVM-autotuned shaders WebLLM needs. ~40 tok/s on M2 Pro.
Watch a real 3.8B-parameter transformer think, tensor by tensor. Every glow is a live activation read back from WebGPU — same Phi-3-mini weights as zerotvm, but every intermediate tensor is rendered 1:1.
The author's site. Index of all the projects, papers, and notes behind this research line. Includes contact info and the full kernel-fusion paper backlog.
Drag a slider in the hyperscope and a chain of math fires:
// On every bond-slider tick (~60 fps target) const R = 0.7414; // Å (slider value) const {H, integrals} = buildH2Dense(R); // → STO-3G primitives, Boys F₀, ERIs, Löwdin S^(-½) const {value: E_FCI, vector: ψ} = smallestEigenpair(H, 16); // → 16×16 Jacobi diagonalization, ~1 ms const grid = densityGridFromCoeffs( R, ψ.cG, ψ.cU, GRID_SIZE ); // 110k samples scene.updateGrid(grid); // → WebGPU splat shader
All of it runs in your tab. No server. No Python. No native code.