browser-native · webGPU · PySCF cross-checked · distributed swarm · CI green

Quantum chemistry,
in a browser tab.

Hartree-Fock + UHF + UCCSD (radicals), RKS-DFT + UKS-DFT (5 functionals, open + closed shell), MP2, CCSD, CCSD(T) on GPU (~14× median speedup on H₂O cc-pVDZ), EE/IP/EA-EOM-CCSD (PySCF cross-checked), a size-gated f64 density-fitting engine across RHF/UHF/RKS/UKS/MP2 — exact integrals for small systems, density fitting when the 4-index ERI won't fit a tab (naphthalene's would be 9.7 GB — never built, run to a converged energy anyway); DF validated to chemical accuracy vs exact on small molecules (plus an optional WebGPU-accelerated build for the medium regime), full polarizability stack — static α + dynamic α(ω) + Casimir-Polder C₆ — across every reference (RHF/UHF/RKS/UKS), Grimme D2 dispersion with analytical gradient, geometry optimization, IR + Raman + UV-vis spectra (singlet + triplet), Foster-Boys + Pipek-Mezey orbital localization, NOON + multireference verdict + ⟨S²⟩ diagnostics, ionization potentials + electron affinities, ideal-gas thermochemistry, Molden + Gaussian Cube + QCSchema + XYZ exports — plus a quantum-many-body playground (statevector, MPS, kernel fusion, phase transitions). All WebGPU, all TypeScript, all cross-validated against PySCF, libxc, ITensor, an internal brute-force EOM-CCSD reference, and gas-phase experimental references.

And the crowd is the cluster. Open the page in many tabs — or across machines — and they pool into a swarm over a free broker, splitting a batch of single-points — each tile a chemistry-grade DF energy, GPU-accelerated where there's a GPU. The distributed run is cross-machine-validated on two independent CI runners (the throughput is the scaling axis, not wall-time). Open the swarm →

Open the hyperscope Browse the experiments
~14×
cc-pVDZ CCSD(T) on H₂O — GPU vs CPU median (39× best run; std/median 42%, noisy)
35 µHa
match with PySCF on cc-pVDZ HF (spherical-d)
0.27 eV
LiH EOM-CCSD singlet vs PySCF — method precision
geom-opt speedup vs FD (analytical gradients)

Six live demos.

Six URLs. Open any of them — no install, no Linux, no Python. Just a tab.

Live molecular screen

Press one button and watch a real quantum-chemistry screen run in your tab: a full Hartree–Fock SCF per molecule, an isoelectronic aza-chain library ranked by HOMO–LUMO gap, a self-sorting leaderboard and absorption-shift spectrum — and honest flagging of the RHF-instability artifacts a real screen has to catch.

real HF, PySCF-checked azo-motif enrichment artifacts flagged

Hyperscope

Three synchronized 3D panels: H₂ electron density, conditional pair density (with a draggable cursor finding the Fermi/Coulomb hole), and a live MPS bond-network with phase-transition slider and quench-dynamics light cone.

HF / DFT / FCI STO-3G + cc-pVDZ TFIM phase transition

Molecular SI report

Pick a molecule, click run, get the complete chemistry-paper SI bundle in your tab: optimized geometry, IR + Raman spectra, UV-vis (singlet + triplet), dipole + α + β tensors, Mulliken charges, thermochemistry at 298 K, ionization potentials. Every property an experimental SI table reports.

HF · 5 DFT functionals · UHF STO-3G ~30 s end-to-end

Swarm

The crowd is the cluster: open the page in several tabs (or across machines) and they form a swarm over a free broker, splitting a batch of chemistry single-points — each tile a chemistry-grade HF/DFT/MP2 energy (RHF/UHF/RKS/UKS), auto-picking exact or density fitting by size. Bond-length scans, radical curves, and screen a molecule library by HOMO–LUMO gap across the crowd (greedy-pull balancing; ~1.7×/2.4× on 2/4 tabs — throughput is the win, not single-molecule speed).

BroadcastChannel · WebRTC · MQTT relay GPU-DF, chemistry-grade cross-machine on free VMs

Research dashboard

Run the E1–E16+ experiment ladder live: gate fidelity, dispatch roofline, MPS correctness, kernel-fusion benchmarks, VQE on H₂ → CCSD(T) on cc-pVDZ H₂O. Every run produces a JSON artifact with environment capture.

E1–E16+ protocols JSON artifacts Playwright e2e

Gate-throughput demo

The original benchmark page: a handful of textbook circuits (Bell, GHZ, QFT, Deutsch-Jozsa) running on the GPU with CPU cross-check, gate-rate measurement, and bandwidth roofline.

Bell · GHZ · QFT · DJ GPU vs CPU throughput

The research ladder.

Each level builds on the last. Start at the GPU statevector and climb through tensor networks, kernel fusion, and quantum chemistry.

1
Statevector — WebGPU

2^N complex amplitudes on GPU storage buffers. f32 single-qubit and controlled-U kernels, dispatch overhead α ≈ 22 μs on Apple Metal-3.

2
MPS — TypeScript

f64 Jacobi complex-SVD + canonical-form sweeps. Validated against ITensor DMRG to f64 precision on N=8 chains; TFIM & Heisenberg N=128 in browser, χ ≤ 32.

3
Kernel fusion

JIT-emitted WGSL chains plus 4×4 brick-wall, 8×8 cascade, and 16×16 quad-cascade tile fusion. 4.18× at the 8×8 sweet spot (Tier C, N=15); 16×16 (Tier D) plateaus at 3.14× — an honest negative as the wider tile crosses into compute-bound territory.

4
Quantum chemistry — full stack

HF + UHF (radicals), RKS-DFT + UKS-DFT (5 functionals — LDA / BVWN5 / BLYP / B3VWN5 / B3LYP5, closed + open shell), MP2 + DF-MP2, FCI, CCSD + UCCSD, CPU CCSD(T) + UCCSD(T), and a hand-written WGSL CCSD(T) kernel (1 thread per occupied (i, j, k) triple, f32→f64 reduce). cc-pVDZ CCSD(T) on H₂O in ~5 s on GPU vs ~199 s CPU; HF matches PySCF to 35 µHa; Cholesky density fitting wired into HF + MP2; full counterpoise/BSSE across all methods + optional D2 dispersion add-on.

5
Geometry + IR + Raman + thermo

Analytical Pulay gradients on every level (HF + LDA + GGA + hybrids), L-BFGS geometry optimization ( faster than FD), harmonic vibrational frequencies + IR intensities + Raman activities, ZPE + ideal-gas thermochemistry at any (T, P). H₂O frequencies match Pople 1969 to 0.1 cm⁻¹; entropy 45.06 vs experiment 45.10 cal/(mol·K).

6
Spectra, properties, excited states

CIS / TDA / TDDFT singlet + triplet across HF + 5 DFT functionals. EE-, IP-, and EA-EOM-CCSD for correlated excited states, IPs, EAs — PySCF cross-checked. Full polarizability stack — static α via CPHF + dynamic α(ω) via TDHF/TDDFT + α(iω) on imaginary axis + Casimir-Polder C₆ van-der-Waals dispersion — across all four reference types (RHF / UHF / RKS / UKS). Grimme D2 dispersion (energy + analytical gradient). Foster-Boys + Pipek-Mezey orbital localization. Oscillator strengths, hyperpolarizability β, Koopmans / ΔSCF / EOM IPs, Mulliken charges + spin populations, Mayer bond orders + valences, NOON, T1 / D1, ⟨S²⟩, multireference verdict, TRK sum rule, energy decomposition. Molden / Gaussian Cube / QCSchema / multi-frame XYZ exports.

Companion projects.

webgpu-q is one front of a broader research line on GPU-resident compute in the browser. The umbrella site, the benchmark infrastructure, and the sister demos:

kernelfusion.dev

The research umbrella. Single-kernel fusion for GPU workloads — evolutionary search, transformer decoding, and browser-to-browser distributed evolution. Up to 2,865× speedups on Apple Silicon by collapsing per-dispatch overhead.

Open SDKOpen benchmarks

webgpudna.com

The sister project. Geant4-DNA (Monte Carlo electron track structure + IRT radiolysis chemistry + DNA damage scoring) in the browser. CSDA 0.985× of reference, Geant4-validated. Where the splat shader and the “3D field + time = 4D viewer” pattern in our hyperscope come from.

Karamitros 2011 IRTSSB/DSB scoring

gpubench.dev

“How fast is your GPU in the browser?” Real WebGPU compute tests across 592 devices, 7 vendors — full transparency, no cherry-picking. Run it before you trust the numbers on this page.

Rastrigin · N-bodyRL envs
~40 tok/s · M2 Pro

zerotvm.com

Phi-3-mini (3.8B parameters) running entirely in the browser via 10 hand-written WebGPU kernel roles across 27 WGSL files, replacing the 85 TVM-autotuned shaders WebLLM needs. ~40 tok/s on M2 Pro.

10 kernel roles27 WGSL files

neuropulse.live

Watch a real 3.8B-parameter transformer think, tensor by tensor. Every glow is a live activation read back from WebGPU — same Phi-3-mini weights as zerotvm, but every intermediate tensor is rendered 1:1.

Live activationsNo server
A·B·G Ahmet Baris Gunaydin

barisgunaydin.com

The author's site. Index of all the projects, papers, and notes behind this research line. Includes contact info and the full kernel-fusion paper backlog.

AboutPapersContact

What's actually computed, in your tab.

Drag a slider in the hyperscope and a chain of math fires:

// On every bond-slider tick (~60 fps target)
const R = 0.7414;            // Å (slider value)
const {H, integrals} = buildH2Dense(R);
// → STO-3G primitives, Boys F₀, ERIs, Löwdin S^(-½)

const {value: E_FCI, vector: ψ} =
  smallestEigenpair(H, 16);
// → 16×16 Jacobi diagonalization, ~1 ms

const grid = densityGridFromCoeffs(
  R, ψ.cG, ψ.cU, GRID_SIZE
);                       // 110k samples
scene.updateGrid(grid);  // → WebGPU splat shader

All of it runs in your tab. No server. No Python. No native code.

What this means concretely:
  • The H₂ ground state is recomputed from molecular integrals every frame as you drag the bond slider.
  • The conditional pair density ρ(r₂|r₀) is recomputed for every cursor move, showing the textbook Fermi+Coulomb hole.
  • An MPS chain of 12+ qubits runs imaginary-time evolution in under 1 second for the phase-transition demo.
  • Every animation frame is real numerical computation, not a pre-rendered movie.
WebGPU compute shaders TypeScript · zero deps for math WGSL · JIT-emitted for fusion Vite · ESM-only Vitest · CI green Playwright e2e on Chromium-Metal-3 PySCF · ITensor · libxc reference