Hartree-Fock + UHF + UCCSD (radicals), RKS-DFT + UKS-DFT (5 functionals, open + closed shell), MP2, CCSD, CCSD(T) on GPU (~14× median speedup on H₂O cc-pVDZ), EE/IP/EA-EOM-CCSD (PySCF cross-checked), a size-gated f64 density-fitting engine across RHF/UHF/RKS/UKS/MP2 — exact integrals for small systems, density fitting when the 4-index ERI won't fit a tab (naphthalene's would be 9.7 GB — never built, run to a converged energy anyway); DF validated to chemical accuracy vs exact on small molecules (plus an optional WebGPU-accelerated build for the medium regime), full polarizability stack — static α + dynamic α(ω) + Casimir-Polder C₆ — across every reference (RHF/UHF/RKS/UKS), Grimme D2 dispersion with analytical gradient, geometry optimization, IR + Raman + UV-vis spectra (singlet + triplet), Foster-Boys + Pipek-Mezey orbital localization, NOON + multireference verdict + ⟨S²⟩ diagnostics, ionization potentials + electron affinities, ideal-gas thermochemistry, Molden + Gaussian Cube + QCSchema + XYZ exports — plus a quantum-many-body playground (statevector, MPS, kernel fusion, phase transitions). All WebGPU, all TypeScript, all cross-validated against PySCF, libxc, ITensor, an internal brute-force EOM-CCSD reference, and gas-phase experimental references.
And the crowd is the cluster. Open the page in many tabs — or across machines — and they pool into a swarm over a free broker, splitting a batch of single-points — each tile a chemistry-grade DF energy, GPU-accelerated where there's a GPU. The distributed run is cross-machine-validated on two independent CI runners (the throughput is the scaling axis, not wall-time). Open the swarm →
Six URLs. Open any of them — no install, no Linux, no Python. Just a tab.
Press one button and watch a real quantum-chemistry screen run in your tab: a full Hartree–Fock SCF per molecule, an isoelectronic aza-chain library ranked by HOMO–LUMO gap, a self-sorting leaderboard and absorption-shift spectrum — and honest flagging of the RHF-instability artifacts a real screen has to catch.
Three synchronized 3D panels: H₂ electron density, conditional pair density (with a draggable cursor finding the Fermi/Coulomb hole), and a live MPS bond-network with phase-transition slider and quench-dynamics light cone.
Pick a molecule, click run, get the complete chemistry-paper SI bundle in your tab: optimized geometry, IR + Raman spectra, UV-vis (singlet + triplet), dipole + α + β tensors, Mulliken charges, thermochemistry at 298 K, ionization potentials. Every property an experimental SI table reports.
The crowd is the cluster: open the page in several tabs (or across machines) and they form a swarm over a free broker, splitting a batch of chemistry single-points — each tile a chemistry-grade HF/DFT/MP2 energy (RHF/UHF/RKS/UKS), auto-picking exact or density fitting by size. Bond-length scans, radical curves, and screen a molecule library by HOMO–LUMO gap across the crowd (greedy-pull balancing; ~1.7×/2.4× on 2/4 tabs — throughput is the win, not single-molecule speed).
Run the E1–E16+ experiment ladder live: gate fidelity, dispatch roofline, MPS correctness, kernel-fusion benchmarks, VQE on H₂ → CCSD(T) on cc-pVDZ H₂O. Every run produces a JSON artifact with environment capture.
The original benchmark page: a handful of textbook circuits (Bell, GHZ, QFT, Deutsch-Jozsa) running on the GPU with CPU cross-check, gate-rate measurement, and bandwidth roofline.
Each level builds on the last. Start at the GPU statevector and climb through tensor networks, kernel fusion, and quantum chemistry.
2^N complex amplitudes on GPU storage buffers. f32 single-qubit and controlled-U kernels, dispatch overhead α ≈ 22 μs on Apple Metal-3.
f64 Jacobi complex-SVD + canonical-form sweeps. Validated against ITensor DMRG to f64 precision on N=8 chains; TFIM & Heisenberg N=128 in browser, χ ≤ 32.
JIT-emitted WGSL chains plus 4×4 brick-wall, 8×8 cascade, and 16×16 quad-cascade tile fusion. 4.18× at the 8×8 sweet spot (Tier C, N=15); 16×16 (Tier D) plateaus at 3.14× — an honest negative as the wider tile crosses into compute-bound territory.
HF + UHF (radicals), RKS-DFT + UKS-DFT (5 functionals — LDA / BVWN5 / BLYP / B3VWN5 / B3LYP5, closed + open shell), MP2 + DF-MP2, FCI, CCSD + UCCSD, CPU CCSD(T) + UCCSD(T), and a hand-written WGSL CCSD(T) kernel (1 thread per occupied (i, j, k) triple, f32→f64 reduce). cc-pVDZ CCSD(T) on H₂O in ~5 s on GPU vs ~199 s CPU; HF matches PySCF to 35 µHa; Cholesky density fitting wired into HF + MP2; full counterpoise/BSSE across all methods + optional D2 dispersion add-on.
Analytical Pulay gradients on every level (HF + LDA + GGA + hybrids), L-BFGS geometry optimization (7× faster than FD), harmonic vibrational frequencies + IR intensities + Raman activities, ZPE + ideal-gas thermochemistry at any (T, P). H₂O frequencies match Pople 1969 to 0.1 cm⁻¹; entropy 45.06 vs experiment 45.10 cal/(mol·K).
CIS / TDA / TDDFT singlet + triplet across HF + 5 DFT functionals. EE-, IP-, and EA-EOM-CCSD for correlated excited states, IPs, EAs — PySCF cross-checked. Full polarizability stack — static α via CPHF + dynamic α(ω) via TDHF/TDDFT + α(iω) on imaginary axis + Casimir-Polder C₆ van-der-Waals dispersion — across all four reference types (RHF / UHF / RKS / UKS). Grimme D2 dispersion (energy + analytical gradient). Foster-Boys + Pipek-Mezey orbital localization. Oscillator strengths, hyperpolarizability β, Koopmans / ΔSCF / EOM IPs, Mulliken charges + spin populations, Mayer bond orders + valences, NOON, T1 / D1, ⟨S²⟩, multireference verdict, TRK sum rule, energy decomposition. Molden / Gaussian Cube / QCSchema / multi-frame XYZ exports.
webgpu-q is one front of a broader research line on GPU-resident compute in the browser. The umbrella site, the benchmark infrastructure, and the sister demos:
The research umbrella. Single-kernel fusion for GPU workloads — evolutionary search, transformer decoding, and browser-to-browser distributed evolution. Up to 2,865× speedups on Apple Silicon by collapsing per-dispatch overhead.
The sister project. Geant4-DNA (Monte Carlo electron track structure + IRT radiolysis chemistry + DNA damage scoring) in the browser. CSDA 0.985× of reference, Geant4-validated. Where the splat shader and the “3D field + time = 4D viewer” pattern in our hyperscope come from.
“How fast is your GPU in the browser?” Real WebGPU compute tests across 592 devices, 7 vendors — full transparency, no cherry-picking. Run it before you trust the numbers on this page.
Phi-3-mini (3.8B parameters) running entirely in the browser via 10 hand-written WebGPU kernel roles across 27 WGSL files, replacing the 85 TVM-autotuned shaders WebLLM needs. ~40 tok/s on M2 Pro.
Watch a real 3.8B-parameter transformer think, tensor by tensor. Every glow is a live activation read back from WebGPU — same Phi-3-mini weights as zerotvm, but every intermediate tensor is rendered 1:1.
The author's site. Index of all the projects, papers, and notes behind this research line. Includes contact info and the full kernel-fusion paper backlog.
Drag a slider in the hyperscope and a chain of math fires:
// On every bond-slider tick (~60 fps target) const R = 0.7414; // Å (slider value) const {H, integrals} = buildH2Dense(R); // → STO-3G primitives, Boys F₀, ERIs, Löwdin S^(-½) const {value: E_FCI, vector: ψ} = smallestEigenpair(H, 16); // → 16×16 Jacobi diagonalization, ~1 ms const grid = densityGridFromCoeffs( R, ψ.cG, ψ.cU, GRID_SIZE ); // 110k samples scene.updateGrid(grid); // → WebGPU splat shader
All of it runs in your tab. No server. No Python. No native code.