Options Market Making

Options Market Making and Derivatives Pricing: Technology Reference

Systems-internals view of the full technology stack behind options market making and derivatives pricing. Focus throughout is implementation, data structures, hardware, latency/throughput, and numerical methods — finance theory appears only where it constrains the engineering.

Architecture overview:

  Market data (OPRA, >70 Gbps microbursts, 200B msgs/day)
         │
  ┌─────────────┐  FPGA: feed decode + NBBO filter + closed-form reprice
  │  FPGA NIC   │  ~746 ns wire-to-wire (NovaSparks), deterministic latency
  │ (quote path)│──► mass quote ────────────── exchange
  └─────────────┘
         │ fills, deltas
         ▼
  ┌─────────────┐  CPU: orchestration, OMS, LM calibration driver,
  │  CPU cores  │  limit checks, OPRA channel sharding, vol surface fit
  └─────────────┘
         │ batch pricing requests
         ▼
  ┌─────────────┐  GPU: full-book scenario revaluation, exotic MC,
  │     GPU     │  VaR/stress grids, XVA exposure simulation, margin
  └─────────────┘

The quoting loop (market data → reprice → quote out, single-digit µs wire-to-wire) runs on FPGA/CPU. The risk loop (calibration, full-book risk, scenario revaluation, margin — ms to seconds) runs on GPU. The same pricing math lives in both loops at different fidelities.

0. Primer: Options from First Principles

This section is for readers new to options. If you already know what delta and gamma mean, skip to §1.

0.1 What is an option?

A stock option is a contract that gives you the right but not the obligation to buy or sell a stock at a fixed price (the strike, K) on or before a fixed date (the expiry, T).

Call option — right to buy at the strike. Profitable if the stock price ends up above the strike.
Put option — right to sell at the strike. Profitable if the stock price ends up below the strike.

You pay a premium upfront to buy that right. The seller (writer) collects the premium and takes on the obligation.

Concrete example. Stock trades at $100. You buy a call with strike $105 expiring in 30 days for a $2 premium.

If the stock hits $115 at expiry: you exercise, buy at $105, immediately worth $115 → profit $115 − $105 − $2 = $8.
If the stock stays at $98 at expiry: you don't exercise → you lose only the $2 premium.

The call has limited downside (lose at most the premium) and unlimited upside. The put seller has the mirror profile.

0.2 Why options are hard to price

With a stock, pricing is easy: it's whatever the market says right now. With an option, the value today depends on what the stock might do between now and expiry — which is unknowable. You have to model the distribution of future stock prices.

The key insight of Black-Scholes (1973): if you assume the stock moves randomly (log-normal Brownian motion) with a known volatility σ, you can derive an exact formula for the option price. That formula is what §1 is about.

The catch: volatility σ is not directly observable. You must infer it from market prices — this is the "implied volatility" (IV) problem that drives a huge amount of the engineering work.

0.3 Moneyness

The relationship between the current stock price S and the strike K:

Term	Condition	Intuition
In-the-money (ITM)	Call: S > K / Put: S < K	Would be profitable to exercise right now
At-the-money (ATM)	S ≈ K	Right on the edge
Out-of-the-money (OTM)	Call: S < K / Put: S > K	Would not be worth exercising right now

OTM options are cheaper (lower probability of payoff) but still have value because the stock could move there before expiry.

0.4 The Greeks — what they measure

The Greeks quantify how sensitive an option's price is to each input. A market maker must track these constantly because they determine how to hedge.

Delta (Δ) — sensitivity to stock price

If the stock goes up $1, how much does the option price change? That's delta.

A call with delta 0.5 means: stock up $1 → option up ~$0.50.
Delta ranges from 0 to 1 for calls (0 to −1 for puts). Deep ITM delta ≈ 1 (moves like the stock). Deep OTM delta ≈ 0 (barely moves).

A market maker delta-hedges by buying/selling the underlying stock to cancel out delta. If you're long a call with delta 0.5, you short 0.5 shares of stock → your net delta = 0, you're "delta-neutral". Now you don't care about small moves in the stock price.

Gamma (Γ) — how fast delta changes

Delta isn't constant — it changes as the stock moves. Gamma measures that rate of change.

High gamma = delta changes a lot with small stock moves → you need to re-hedge frequently.
Gamma is highest for ATM options near expiry (the option is on the knife's edge — a small move flips whether you exercise or not).

Gamma is the source of profit for option buyers and the source of risk for option sellers. If you are long gamma and the stock moves a lot, you profit (you keep re-hedging in a favorable way). If you are short gamma and the stock moves a lot, you lose.

Theta (Θ) — time decay

Options lose value as time passes, all else equal — because there's less time for the stock to move in a favorable direction. Theta is the daily dollar loss from time passing.

The fundamental tension: gamma and theta are opposites. Long gamma (option buyer) profits from big moves but bleeds theta every day. Short gamma (option seller) collects theta every day but gets hurt by big moves. The question is always whether the realized moves will be bigger or smaller than what the option priced in.

Vega (ν) — sensitivity to volatility

If implied volatility goes up by 1%, how much does the option price change? That's vega. All options have positive vega — higher volatility makes options more valuable (more chance of a big move). Vega is largest for ATM options with long time to expiry.

A market maker accumulates vega risk across thousands of positions. Managing this is essentially managing exposure to the level of implied volatility across strikes and maturities — the volatility surface (§3).

Vanna and Volga — second-order cross-Greeks

Vanna: how delta changes when volatility changes (or equivalently, how vega changes when the stock moves). Matters for skew-hedging.
Volga (vomma): how vega changes when volatility changes — convexity in vol. Matters for wing/tail risk.

0.5 What a market maker actually does

A market maker continuously posts two-sided quotes: a bid (price they'll buy at) and an ask (price they'll sell at). They profit from the spread between the two.

The problem: they don't know in advance whether the next order will be a buy or a sell, and they might accumulate a large position in one direction (inventory risk). They also face the risk that whoever is trading against them knows something they don't (adverse selection — someone bought that call because they know the stock is about to pop).

For options, the market maker is doing this across thousands of strikes × expiries × underlying names simultaneously. They continuously:

Compute theoretical prices (theos) for every option using a pricing model
Add a spread around the theo based on risk (gamma, vega, inventory, adverse-selection signals)
Emit two-sided quotes to exchanges
Hedge fills by trading the underlying (to zero out delta)
Update the model when implied volatility changes

The systems problem: doing all of this at nanosecond to microsecond speed, across 1.6M+ instruments, with incoming market data arriving at 70+ Gbps. That's what the rest of this doc is about.

0.6 The volatility surface — why it's central

Black-Scholes assumes a single constant volatility σ for all strikes and maturities. In reality, the market implies a different σ for every strike and expiry — the volatility smile/skew. Put options on the same stock trade at higher implied vol than calls (the "skew") because investors pay up for crash protection. Short-dated options trade differently from long-dated ones.

The volatility surface is the function σ_implied(K, T) inferred from all market prices simultaneously. Market makers maintain a real-time arbitrage-free fit of this surface (§3) and quote off it — their theo for any option is "plug into the surface, read the vol, plug into Black-Scholes."

This is why implied vol, not price, is the language of options trading. Two traders agree on what the "right" price is by agreeing on what implied vol is appropriate — the conversion to a dollar price is mechanical.

0.7 How this doc is organized

Having read this primer, the rest of the doc makes sense as a stack:

  §3 Vol surface       ← the "market model" every desk runs
       │
  §1 Pricing models   ← math engines computing theos from the surface
  §2 Greeks            ← sensitivities, hedging ratios, risk decomposition
       │
  §4 MM mechanics     ← how theos + Greeks turn into quotes + hedges
       │
  §5 GPU/FPGA         ← hardware making §1-§4 fast enough
  §7 Exchange proto   ← how quotes actually reach the exchange
  §6 Risk systems     ← P&L, VaR, margin
       │
  §9 IR derivatives   ← same ideas applied to bonds/rates instead of stocks
  §10 Exotics          ← complex payoffs beyond vanilla calls/puts
  §11 ML               ← neural networks entering the pricing stack
  §12 Microstructure   ← market dynamics, 0DTE, dealer gamma effects

1. Pricing Models

1.1 Black-Scholes closed form and the normal-CDF kernel

The European call price under Black-Scholes-Merton:

C = S·e^(-qT)·N(d1) − K·e^(-rT)·N(d2)
d1 = [ln(S/K) + (r − q + σ²/2)T] / (σ√T)
d2 = d1 − σ√T

The entire hot path is N(·), the standard normal CDF. Everything else is a handful of log, exp, sqrt, and FMA ops. A fast BS pricer is in practice a fast vectorized erf/N plus a fast vectorized exp/log. The model has been a SIMD/GPU benchmark for two decades precisely because it is embarrassingly parallel and arithmetic-bound, not memory-bound — vectorization is limited only by register width (~16× for AVX-512 on f32).

Normal-CDF approximations (ranked by where each fits):

Method	Form	Accuracy	Notes
Abramowitz & Stegun 26.2.17	rational poly in `t = 1/(1+0.2316419·x)`, 5 coeffs	~7.5e-8 abs	The classic; one branch on sign. Ubiquitous in textbooks/teaching code.
Hart (1968)	rational `erfc` with branches by magnitude	~1e-15	Production-grade double precision; used inside many libm `erf`.
Cody (1969)	Chebyshev rational, region-split	18–21 sig digits	Basis of most libm `erf`/`erfc`.
West (2009) "Better than Black-Scholes"	Cody-derived, tuned for option range	~1e-15	Popular in quant libs; designed for the moneyness range that matters.
Polynomial / minimax (per-platform)	degree-tuned poly over `erf`	tunable	What you generate when you want a branchless SIMD kernel.

Vectorization mechanics. For an AVX-512 kernel you want branchless CDF: a single minimax polynomial (or a small set selected by vblend/mask) over the whole input range, so all 16 lanes execute the same instruction stream. Abramowitz-Stegun's t = 1/(1+a·|x|) form maps cleanly to FMA chains; the exp(-x²/2) term uses a vectorized exp from Intel SVML (_mm512_exp_ps) or Sleef/vectorclass. Intel's oneMKL vsBlackScholes reports the expected ~16× over scalar on AVX-512 Xeon.

Data layout is the real lever. Store option parameters struct-of-arrays (separate S[], K[], T[], sigma[], r[] arrays), not array-of-structs. SoA lets a single load fill a vector register with 16 strikes; AoS forces gathers. Real pricers price a whole strike chain at one expiry in one pass — same S, T, r, varying K — which is the ideal SIMD shape.

Implied volatility inversion — the other hot kernel. Given a market price, solve for σ. Naïve Newton on BS is slow and unstable near the wings. The production standard is Peter Jäckel, "Let's Be Rational" (Wilmott, 2015):

Normalize to Black-76; work in x = ln(F/K) log-moneyness.
Four rational-function initial-guess branches selected by moneyness.
Refine with a Householder method of convergence order 4 (rational function of the residual).
Result: two iterations to full double-precision machine accuracy for all inputs, sub-microsecond, GPU-portable.

Key insight: A BS desk doesn't have a "Black-Scholes performance problem" — it has a "vectorized erf + exp + a 2-iteration rational IV solver" problem. Get those two kernels branchless and SoA and the model layer disappears from the profile.

1.2 Heston stochastic volatility — Fourier/COS pricing and calibration

Heston (1993) adds a mean-reverting CIR variance process correlated with spot. There is no closed-form price, but the characteristic function φ(u) of log-spot is known in closed form, which is the whole reason Heston is tractable. Pricing is then a Fourier inversion.

Three transform families (pick by use case):

Lewis (2001) / fundamental transform. Prices as a contour integral in the complex plane of φ against a payoff transform; contour shifting yields numerically stabler formulas. The theoretical unification.
Carr-Madan (1999) FFT. Damp the call price e^(αk)C(k) with damping factor α>0 (α=1.5 standard), express that transform in terms of φ, then invert with one FFT to get prices across a whole log-strike grid at once. A single FFT prices the entire strike chain — ideal for calibration where you reprice many strikes.
COS method — Fang & Oosterlee (2008). Expand the (Fourier-cosine) density on a truncated interval [a,b] derived from cumulants; payoff coefficients are analytic for vanillas. Converges exponentially in the number of cosine terms N for smooth densities — typically N≈128–256 for Heston vanillas. Usually the fastest and most stable per-option method.

Calibration = least-squares fit of (κ, θ, σ_v, ρ, v0) to a grid of market IVs/prices. Engineering reality (Cui et al., "Full and fast calibration of the Heston model," EJOR 2017):

Use a numerically stable form of φ — the "little Heston trap" (Albrecher et al.) picks the right branch of the complex sqrt/log to avoid discontinuities that wreck the integral.
Derive analytic gradients of price w.r.t. the 5 parameters from φ. Makes a Levenberg-Marquardt step ~10× faster than finite-difference gradients.
GPU calibration: the inner repricing loop (reprice whole quote grid via COS/FFT) is embarrassingly parallel over strikes×maturities → thousands of GPU threads; the optimizer (LM / differential evolution) stays on host.

1.3 SABR — Hagan asymptotic formula and arbitrage-free variants

SABR (Hagan, Kumar, Lesniewski, Woodward, "Managing Smile Risk," Wilmott 2002): stochastic forward with vol-of-vol, params (α, β, ρ, ν). Built for rates/swaptions smiles.

Why it dominated: Hagan gives a closed-form asymptotic expansion for implied vol as a function of strike — no PDE/MC at quote time. Plug strike → get σ_BS → plug into Black-76. This is microseconds per strike and trivially vectorizable.

Implementation points:

β is set, not fitted. Fitting β to data "has the same effect as fitting market noise." Desks fix β (often 0.5 for rates, 1 lognormal, 0 normal) and calibrate α, ρ, ν. With ATM vol pinned, α is recovered by solving a cubic, leaving a clean 2-parameter (ρ, ν) fit per smile.
Normal SABR (β=0) is the relevant regime in low/negative-rate markets where lognormal blows up.
The classic Hagan formula admits arbitrage (negative densities) in the low-strike wing. Fixes: Hagan's own 2014 "arbitrage-free SABR" solves a forward PDE for the density with absorbing boundary → arb-free, 2nd-order-accurate smiles. Antonov's "free-boundary SABR" and exact-MC mapping are the other production routes. Trade-off: closed-form Hagan is instant but arbitrageable; PDE-SABR is arb-free but costs a grid solve per smile.

1.4 Local volatility — Dupire equation

Dupire (1994): the unique local-vol function reproducing an arbitrage-free call surface:

σ_loc²(K,T) = [ ∂C/∂T ] / [ ½ K² ∂²C/∂K² ]     (zero-rate form)

The formula differentiates the market surface, which is noisy — naïve finite differences on raw quotes explode. Practice:

Parameterize first, differentiate second. Fit an arbitrage-free representation (SVI in (k, w) total-variance space — §3) and take analytic ∂T/∂²K of that. Guarantees ∂²C/∂K² ≥ 0 so σ_loc² stays positive.
Two consumption modes: (a) backward PDE / FD to price exotics under the calibrated leverage surface; (b) Monte Carlo with the leverage function cubic-spline-interpolated in spot, linear in time.
Local Stochastic Vol (LSV): a leverage function on top of Heston, calibrated so the blend reprices vanillas exactly while inheriting Heston's forward-smile dynamics. The GPU Heston-SLV calibration papers target exactly this.

1.5 Monte Carlo and variance reduction

Technique	Mechanism	Typical gain	Cost / caveat
Antithetic variates	pair each draw `Z` with `−Z`; average	30–50% var. reduction on smooth payoffs	~free; useless/harmful for non-monotone payoffs
Control variates	subtract MC error of a correlated instrument with known closed form	large when correlation high	needs an analytically-priced control
Quasi-MC (Sobol)	low-discrepancy sequence instead of pseudo-random	error → ~`O((log N)^d / N)` vs `O(N^-1/2)`	use `N = 2^k`; needs Brownian-bridge / PCA path construction; breaks naïve error bars
Importance sampling	shift the measure to sample the payoff-relevant region	orders of magnitude for tail/barrier	must correct with likelihood ratio; tuning-sensitive
Brownian bridge / PCA	reorder/condition the path so first dimensions carry most variance	makes QMC effective	construction overhead

QMC + Sobol + Brownian bridge is the standard production combo for path-dependent equity/FX exotics. On GPU each path is one thread; the cuRAND Sobol generator plus a per-thread payoff kernel is the canonical layout.

2. Greeks: Real-Time, High-Throughput

2.1 What is being computed

Greek	∂ of price w.r.t.	Order	Why a market maker cares
Delta (Δ)	spot `S`	1st	primary hedge ratio; hedged continuously with underlying/futures
Gamma (Γ)	spot, 2nd (`∂Δ/∂S`)	2nd	hedge re-rate cost; drives spread (gamma risk) and pin risk
Vega (ν)	vol `σ`	1st	exposure to the vol surface; the core MM inventory axis
Theta (Θ)	time `t`	1st	carry/decay; P&L attribution
Rho (ρ)	rate `r`	1st	usually minor for short-dated equity options
Vanna	`∂²/∂S∂σ`	2nd cross	how Δ moves as vol moves; skew-hedging
Volga / vomma	`∂²/∂σ²`	2nd	convexity in vol; smile/wing risk

For BS all these have closed forms sharing the same N'(d1)=φ(d1) and N(d2) already computed for the price — analytic Greeks are nearly free given the price kernel.

2.2 Analytic vs finite-difference vs AAD

Analytic (closed-form): best when available (BS, and Greeks derivable from a known φ via differentiating under the integral in Heston/COS). Fast, exact, no perturbation noise.
Finite difference (bump-and-revalue): (V(x+h) − V(x−h)) / 2h. Model-agnostic, trivial to implement, but cost scales with the number of inputs (N revaluations for N Greeks) and is plagued by the h bias/noise tradeoff. Used when no analytic form exists and AAD isn't wired in.
Algorithmic / Adjoint AD (AAD): the production answer for high-dimensional sensitivities.

2.3 AAD — the central technology

Foundational paper: Giles & Glasserman, "Smoking Adjoints: fast Monte Carlo Greeks" (Risk, 2006). The pathwise sensitivity recursion is run backward through the computation using adjoint variables. The complexity result is the whole point:

Reverse-mode AD computes the gradient of one output w.r.t. an arbitrary number of inputs at a cost of ~3–4× a single price evaluation, independent of the input count.

A risk run needs sensitivities of one P&L/price to thousands of inputs (every node on the vol surface, every rate on the curve, every correlation). Bump-and-revalue costs O(N) price evals; AAD costs O(1). For a 1000-input book that is a ~250× speedup.

Mechanics and gotchas:

Tape-based reverse mode: record every elementary op (the "tape") on the forward pass, then replay it backward accumulating adjoints. Memory = tape size ∝ op count × paths → checkpointing (recompute segments instead of storing) is the standard memory control.
Pathwise differentiability required: discontinuous payoffs (digitals, barriers) break the pathwise/AAD derivative → smooth the payoff or fall back to the likelihood-ratio method for those.
AAD + LSM for Bermudan/American and XVA gives the full sensitivity vector of a callable/credit-valuation-adjustment in one backward sweep.

Libraries:

XAD (auto-differentiation.github.io, open source) — operator-overloading C++ AAD; QuantLib-XAD integration
CompatibL QuantLibAdjoint / TapeScript — earlier AAD-in-QuantLib via taping
CoDiPack / Adept / dco/c++ — general C++ AAD engines used as backends
NAG, Quaternion / Acadia ORE FaCT (FASTER), and bank-internal AAD frameworks

2.4 GPU batching and portfolio risk aggregation

Batch shape: thousands of options (strikes × expiries × underlyings) as parallel threads; analytic-Greek kernels are arithmetic-bound and scale near-linearly.
Risk aggregation: portfolio Δ/Γ/ν are linear sums of per-position Greeks, so aggregation is a giant segmented reduction over the position table grouped by underlying for Δ/Γ and by surface bucket (strike×expiry) for vega. The hard part: every position must be revalued against the same surface snapshot, so the surface version is pinned for the whole risk pass.

Key insight: Vanilla books = analytic Greeks, vectorized, free given the price. Exotic/XVA books = AAD, because the cost of N Greeks collapses to ~4× one price. Finite difference survives only as a fallback where neither closed form nor a differentiable tape exists.

3. Volatility Surface Construction and Dynamics

3.1 The object

A function σ_imp(K, T) (equivalently total variance w(k,T) = σ²(k,T)·T in log-moneyness k = ln(K/F)). Market makers don't quote off a price model directly — they quote off a fitted, arbitrage-free surface and read theos and Greeks from it. The surface is the central shared state of an options desk.

3.2 SVI and SSVI parameterization

SVI (Stochastic Volatility Inspired) — Gatheral (2004). A single-maturity slice of total variance:

w(k) = a + b·( ρ(k − m) + √((k − m)² + σ²) )      (raw SVI, 5 params)

Linear wings (slopes b(1±ρ)), a hyperbola through the ATM minimum. Five parameters per expiry; fits an equity smile tightly. The de-facto industry smile parameterization.

SSVI (Surface SVI) — Gatheral & Jacquier (2014). Ties slices together via the ATM total variance curve θ_T and a shape function φ(θ):

w(k,T) = (θ_T/2)·( 1 + ρ·φ(θ_T)·k + √((φ(θ_T)·k + ρ)² + (1 − ρ²)) )

Each smile has effectively 3 parameters and the whole surface is driven by θ_T + a correlation/shape pair. SSVI admits global arbitrage conditions; the price of constant-ρ-across-maturity is some fit-quality loss at the wings. eSSVI (Hendriks-Martini, Corbetta et al.) lets ρ vary by maturity and gives no-calendar-spread conditions between adjacent slices generalized to a continuous, globally arbitrage-free surface.

3.3 The two arbitrage constraints (the hard engineering invariants)

Calendar-spread (no time arbitrage): total variance must be monotone non-decreasing in T at every fixed k — ∂w/∂T ≥ 0. Forward variance can't be negative.
Butterfly (no strike arbitrage / non-negative density): the risk-neutral density implied by the surface must be ≥ 0, i.e. ∂²C/∂K² ≥ 0. Gatheral's g-function g(k) ≥ 0 is the SVI-space test.

Gatheral-Jacquier (arXiv:1204.0646) give explicit, tractable sufficient conditions on SVI parameters that preclude both. A calibrator that ignores these will fit the quotes and then leak P&L to arbitrageurs (or feed negative densities into a Dupire local-vol build). The fit is a constrained optimization: minimize quote error subject to g(k)≥0 and ∂_T w≥0.

3.4 Interpolation choices

In strike (per slice): SVI/SSVI parametric fit (preferred — globally smooth, controllable no-arb), or cubic spline in log-moneyness of total variance (must check g(k)≥0 post-hoc).
In time: linear interpolation in total variance w between maturities (not in σ), which automatically respects calendar-spread monotonicity if the pillar w's are ordered.
Wings/extrapolation: SVI's linear-in-k wings extrapolate naturally; eSSVI guarantees no-arb extrapolation.

3.5 Real-time re-fit cost and dynamics

Update cadence: cheap rigid shift (translate the whole surface with spot — sticky strike/sticky delta heuristics) on every tick, and a periodic full re-calibration (solve the constrained SVI/SSVI least-squares) at a slower beat. The full LM fit of an SVI slice is microseconds-to-milliseconds; a whole surface is dozens of slices → well within a human-imperceptible loop, but not per-OPRA-message — hence the shift/refit split.
Regime detection (the macro overlay): VIX level + term structure shape drives parameter priors. VIX futures are in contango ~80–84% of the time and flip to backwardation ~20% under stress. The VIX/VIX3M ratio is the standard contango/backwardation regime flag; backwardation = stressed = wider skew, fatter put wing, and wider quoted spreads.

Key insight: The surface is the desk's shared truth, and its two non-negotiable invariants — monotone variance in time and non-negative density in strike — are engineering constraints enforced inside the calibrator, not afterthoughts. SVI/SSVI win because they make those constraints expressible as closed-form parameter conditions.

4. Market-Making Mechanics

4.1 What an options market maker actually does

Stream continuous two-sided quotes across thousands of strikes × expiries × names, simultaneously, on multiple venues, each with its own protocol, expiry structure, and throttling rules. The book must stay roughly Δ-/Γ-/vega-neutral while capturing the spread. This is a quote-generation + hedging + risk control loop running under hard latency and message-rate constraints.

4.2 Quote generation and spread

The theoretical mid comes from the surface (§3). The bid/ask around it is the sum of:

Model/parameter uncertainty — how confident is the theo (wider in illiquid wings).
Gamma risk — cost of re-hedging as spot moves; high-gamma (near-ATM, near-expiry) → wider.
Inventory skew — the Avellaneda-Stoikov (2008) mechanism: shift the reservation price away from mid against current inventory. Reservation r = mid − q·γ·σ²·(T−t) (q = inventory, γ = risk aversion). Long inventory → quote lower (lean to sell). The options extension — Stoikov & Saglam, "Option market making under inventory risk" (2009) — makes the optimal stock and option quotes depend on the net Δ, Γ, and vega of the inventory and the relative liquidity of the option vs the hedging instrument.
Adverse selection — informed flow picks you off; widen when order-flow imbalance / toxic flow signals fire.
Exchange fees and rebates — maker rebates can tighten quotes; taker fees widen.

4.3 Inventory and hedging

Delta is hedged continuously with the cheapest liquid instrument — futures or ETFs, not the option leg — because the underlying/future has the tightest spread and deepest book.
Gamma can't be hedged with linear instruments; it's hedged with other options or managed by spread-widening and inventory limits.
Vega is the residual the desk warehouses; managed at the surface-bucket level and hedged with other options / variance products.
Continuous vs periodic hedging: most desks use band/threshold hedging sized by gamma and transaction cost.

4.4 Feed scale, latency, and quote throttling — the systems core

OPRA is the largest market-data feed on earth. Concrete 2024 numbers:

96 multicast lines (expanded from 48 on 5 Feb 2024); ~1.6M instruments
>200 billion messages/day (130–170B/day observed); regional quotes + NBBO updates
1 ms microburst ~doubled from ~40 Gbps to >70 Gbps; per-second peaks ~28 Gbps aggregate
Dual 40 Gbps cross-connects per side (A and B) is the minimum to avoid drops

Feed-handler implementation (Databento's "Beyond 40 Gbps" write-up):

FPGA-based 100G NIC (e.g. Napatech NT200A02) with large on-card buffers; shards 96 channels into 8×12 with multi-GB host buffers per shard
Kernel bypass mandatory: user-space SDK / DPDK-style direct buffer access
Filter early: ~80% of updates that don't move the NBBO are dropped at ingress — the single biggest downstream-load reducer
C++ feed handlers, shared-memory queues to a Rust/C++ distribution gateway; FPGA for parse/normalize/NBBO

Quote throttling is the output-side constraint: exchanges cap quotes-per-second per MPID. Cannot re-quote every strike on every one of 200B daily ticks. Consequences:

Mass-quote / bulk-quote protocols: update a whole strip of strikes in one message
Quote batching + coalescing: collapse multiple intended updates within a throttle window into one; prioritize strikes by edge/risk so the limited budget goes to the most valuable quotes
FPGA decides which quotes to send and encodes the exchange protocol at line rate, within the throttle, in nanoseconds

4.5 Pin risk, early exercise, settlement

Pin risk: at expiry, the underlying "pins" near a high-open-interest strike. A MM short options at that strike must guess whether they'll be assigned, leaving a binary, un-hedged delta over the weekend. Dealer delta-hedging itself causes pinning: hedging a long-gamma book buys dips / sells rips toward the strike, mechanically pulling spot to the pin.
Early exercise (American): equity options are American → must model the early-exercise boundary. Standard engines: Cox-Ross-Rubinstein binomial / trinomial trees, PDE with a free boundary, or LSM-MC for high-dimensional cases. Early exercise of calls is driven by discrete dividends; puts by rates.
Settlement / assignment risk: AM vs PM settlement (index options), the exercise-decision cutoff after the close, and the resulting un-hedged overnight delta are operational risks the position system must flag pre-expiry.

4.6 What the named firms do publicly

Optiver — C++ + FPGA; FPGA used for protocol handling, feed parsing, and real-time algorithmic adjustment; quotes "thousands of instruments at once across venues with unique throttling rules"; named hardware: Solarflare NICs, OPRA feed handlers, bespoke FPGA components.
IMC, Citadel Securities — same low-latency C++/FPGA archetype, large global feed-infrastructure teams for region-optimized latency.
Jane Street — builds trading systems in OCaml (one of the largest OCaml codebases in existence). The bet: a strong static type system catches mispricing/logic bugs at compile time. Grew from ADRs → equity options → ETFs; ~8% of OCC volume historically.
Susquehanna / SIG — the options-market-making progenitor; three of Jane Street's four founders came from SIG. Known for a poker/decision-theory-driven options culture.
DRW, Akuna, Wolverine, Five Rings — same low-latency-C++/FPGA + research stack archetype.

Key insight: The defining systems problem of options market making is fan-out under a throttle: ingest 200B msgs/day at >70 Gbps, keep an arbitrage-free surface and a Greek-neutral book current, and decide which of thousands of possible quote updates to actually emit inside a hard exchange message budget — all in nanoseconds. That is why the winners are FPGA + kernel-bypass + early-filter shops, and why quote selection (edge-prioritized batching) matters as much as quote computation.

5. GPU / FPGA Acceleration

5.1 The fundamental split: which silicon for which math

Computation	Math character	Best hardware	Why
Black-Scholes / Black-76 European	Closed-form (`exp`/`log`/`erf`)	FPGA	Fully pipelineable, no branching, deterministic latency
Greeks of closed-form models	Closed-form derivatives	FPGA / CPU SIMD	Same as above
Vanilla Monte Carlo (path-dependent, exotics)	Embarrassingly parallel over paths	GPU	Thousands of independent paths = thousands of threads
American / Bermudan (LSM regression)	MC + cross-path regression	GPU (harder)	LSM needs a global regression step per timestep
PDE finite-difference (local vol, barrier)	Banded linear solves per timestep, sequential in time	GPU (batched) / CPU	Time-stepping is sequential; parallelism is across the grid and across instruments
Heston / SLV calibration	Iterative optimizer calling a pricer many times	CPU, GPU-accelerated inner pricer	Optimizer is inherently sequential; only the pricing kernel parallelizes

Rule of thumb: FPGA for the hot quoting path (closed-form, deterministic), GPU for the batch/overnight risk and exotic books, CPU for orchestration and iterative calibration that resists vectorization.

5.2 GPU Monte Carlo — concrete numbers

Published throughput:

Tesla V100: 1M paths × 100 timesteps in ~17 ms; exotic: 8.192M paths × 365 steps in ~26.6 ms (single precision)
Tesla K20X: 200K paths × 50 steps in <10 ms
GTX 670 vs. sequential C: ~250× speedup
GTX 560 vs. i3 CPU: ~50× single-precision, ~12× double-precision — consumer cards cripple FP64 (1/24 to 1/32 of FP32 on GeForce; near 1/2 on Tesla/A100/H100)

Random number generation. NVIDIA cuRAND provides PRNGs (XORWOW, MRG32k3a, MTGP Mersenne-Twister, Philox) and QRNGs (Sobol' low-discrepancy sequences with scrambling) — essential for quasi-MC.

Standard building block: Sobol' + Brownian-bridge path construction. The Brownian bridge assigns leading Sobol' dimensions to the most-important timesteps, which is what makes QMC actually converge faster than pseudo-random MC.

GPU MC bottlenecks:

Memory bandwidth (not FLOPs) is usually the wall — storing full paths for path-dependent payoffs saturates the bus. Mitigation: keep paths in registers/shared memory, fuse payoff into the path kernel
Warp divergence from barrier/early-exercise branching hurts SIMT execution
Longstaff-Schwartz American options need cross-path least-squares regression at each exercise date — breaks pure per-path parallelism; dedicated GPU papers exist (6th Workshop on High Performance Computational Finance, 2013)

5.3 FPGA — when hardware pricing wins

FPGAs win where the model is closed-form and latency must be deterministic. No cache = no cache-miss tail; statically scheduled = predictable latency every cycle.

Survey reference: "The Role of FPGAs in Modern Option Pricing Techniques: A Survey," Electronics (MDPI) 13(16):3186, 2024. Reported gains: 270× to 5400× faster than CPU, plus large energy-efficiency wins.

Concrete data points:

A pipelined Black-Scholes core sustains ~180 million option valuations/sec after pipeline fill (~208 clock cycles)
AMD/Xilinx Vitis Quantitative Finance Library (open source) ships FPGA engines for European and American options + risk building blocks

FPGA-friendly: Black-Scholes/Black-76 and their Greeks (fixed dataflow of exp/log/N(·)), binomial trees of fixed depth, finite-difference schemes with fixed grids.

FPGA-hostile: anything iterative and data-dependent — Heston/SLV calibration loops (LM convergence count unknown at compile time), adaptive-grid PDEs, large-state MC where on-chip memory is the constraint.

Numeric representation: FPGA designs often use fixed-point or custom floating-point to shrink area and boost clock. Function evaluation (exp, log, N(·)) via CORDIC, polynomial/piecewise approximation, or lookup tables + interpolation.

6. Risk Systems and Real-Time P&L

6.1 P&L attribution — the Greek decomposition

Daily (and intraday) P&L decomposes via a second-order Taylor expansion:

ΔV ≈   Δ·ΔS                  (delta P&L — directional spot move)
     + ½·Γ·(ΔS)²             (gamma P&L — convexity / realized-vol capture)
     + Θ·Δt                  (theta P&L — time decay)
     + ν·Δσ                  (vega P&L — implied-vol change)
     + ½·Volga·(Δσ)²         (vol-of-vol)
     + Vanna·ΔS·Δσ           (spot-vol cross)
     + Rho·Δr  + ...
     + UNEXPLAINED            (residual: higher-order, model error)

Gamma vs. Theta is the daily P&L of a delta-hedged book. Long-gamma books monetize realized spot volatility via daily/intraday delta-hedging (gamma scalping) but bleed theta; short-gamma books collect theta but blow up in fast moves.
Vega P&L captures changes in implied volatility — distinct from gamma's exposure to realized moves.
Unexplained P&L is a first-class monitoring signal: a growing residual means the second-order approximation is breaking down or the pricing model is mis-specified.

6.2 VaR / CVaR for options books — full revaluation is mandatory

For options the payoff is non-linear, so a Taylor/delta-gamma approximation misses tail curvature.

Method	Mechanism	Options suitability
Parametric (delta-gamma)	Analytic from Greeks + covariance matrix	Fast but inaccurate in tails; rough guard only
Historical simulation	Apply N historical factor-return scenarios, fully reprice the book under each, read the quantile	Standard; non-parametric; mandatory full repricing
Monte Carlo VaR	Simulate factor paths (spot and vol surface shifts), reprice	Most flexible; captures vol-of-vol; most expensive

CVaR / Expected Shortfall (Basel FRTB 97.5% ES; OCC 99% ES) is now the regulatory and clearing standard. The unavoidable core is full revaluation: reprice the entire book under each scenario — the canonical GPU batch workload.

6.3 Greeks ladder — bucketed aggregation

Desk-level risk view: delta/gamma/vega/theta aggregated into buckets by strike and expiry across the whole book (vega bucketed by tenor since 1-month vega and 1-year vega are different risks). Group-by-and-sum over the position table keyed by (underlier, expiry-bucket, strike-bucket) — maintained incrementally in an in-memory store as fills arrive.

6.4 Margin — SPAN, SPAN 2, TIMS, STANS

CME SPAN (1988) — scenario-based: evaluate portfolio across 16 risk scenarios (price moves × vol moves), take worst-case loss, with inter/intra-commodity spread offsets. Deterministic, array-based, fast, transparent. Global futures/options standard for decades.
CME SPAN 2 — modernization: replaces the rigid 16-scenario array with a historical-VaR / filtered-historical-simulation core plus stress add-ons, unifying futures and options margining.
OCC STANS (2006) — the OCC was the first clearing house to use large-scale Monte Carlo for margin. Base margin = 99% Expected Shortfall over full-portfolio Monte Carlo at a 2-day horizon plus a concentration/dependence stress add-on. Anti-correlated positions reduce total margin (true portfolio margining). Genuinely heavy compute — full revaluation under many thousands of simulated joint scenarios, run daily per account.

Systems takeaway: margin is converging from cheap deterministic scenario arrays (SPAN/TIMS) toward expensive simulation-based ES (SPAN 2 / STANS), pushing clearing-member risk infrastructure toward the same GPU full-revaluation engines used for internal VaR.

6.5 Real-time limit monitoring

Pre-trade gate: every quote/order checks against position limits, net/gross delta limits, vega limits, gamma limits, and notional limits at the MPID/risk-group level. Running aggregates updated on each fill, with hard kill-switch breaches wired into the order gateway.

7. Exchange Connectivity and Options-Specific Protocols

7.1 OPRA — the firehose

OPRA (Options Price Reporting Authority) is the consolidated US options tape — every quote and trade across all ~17 US options exchanges and ~1.6M+ listed instruments. By message volume, the largest financial data feed in existence.

Numbers (post-2024 upgrade):

~130–145 billion messages/day typical; peak ~150.4B in a single day (Q3 2023); OPRA mandates 400B/day capacity headroom
Microbursts exceed 70 Gbps within 1–10 ms windows; 40/100 GbE minimum required
Feed handlers must absorb ~75 million messages/second peak
NBBO updates are only ~20% of traffic — 80% of exchange updates don't move the NBBO, so on-NIC filtering sheds 80% of downstream load

The 96-line expansion (February 2024):

OPRA went to 96 multicast channels
99th-percentile latency dropped from 543.5 µs → 57.5 µs (89% reduction)
Latency outliers down ~75%

Feed-handler architecture (Databento's published design):

Napatech NT200A02 FPGA-based 100G NIC with large on-NIC buffers; 96 channels sharded into 8 groups of 12, rebalanced to decorrelate bursts
Kernel bypass, core pinning, core isolation, NUMA alignment
Pipeline: receive → A/B line arbitration/dedup → parse → normalize → distribute + archive; feed handler in C++, distribution in Rust via shared-memory queue

FPGA OPRA feed handlers (NovaSparks NovaTick):

~746 ns average wire-to-wire latency through the FPGA feed handler
Handles all 96 lines with A/B arbitration, up to 2.2M symbols across 17 exchanges
Raw filtering on-NIC by exchange/symbol/message-type/condition

7.2 Order-entry protocols (options-specific)

Exchange	Protocol	Notes
CME	iLink 3 (binary, SBE-encoded)	Live since Jan 2024; iLink 2 decommissioned Feb 2025. Mass Quote rate governed by separate MPS limit over a 3-second window. OnixS handler: >60k msg/s/core, ~6 µs added latency without persistence
Cboe	BOE (Binary Order Entry), now BOEv3	Latency-equalized connections; flow control: 128 messages in flight, read-stop at 1,024 unacked; dedicated Bulk Quoting Ports
Nasdaq / ISE family	OUCH variants + SQF (Specialized Quote Feed)	SQF is the options market-maker quoting interface

7.3 Mass quote (MQU) — the defining options order type

A Mass Quote message packs many two-sided quotes into a single message, replacing the maker's entire quote set for those series atomically. This is why options connectivity differs fundamentally from equities: the dominant message type is replace-my-whole-chain, not new single order.

7.4 Quote throttling and message-rate budgets

Exchanges throttle quotes-per-second per MPID/port. The maker's quoting engine must implement a client-side token-bucket rate governor that:

Tracks current rate against the per-exchange limit
When at-limit, coalesces pending updates into fewer messages (batch multiple series into one mass quote)
Prioritizes the most important series (most liquid, closest to ATM, most recently moved)

This coalescing/prioritization logic is typically implemented in the FPGA or a dedicated CPU-bound thread.

7.5 Cancel-on-disconnect, kill switches, and risk gates

Options-specific because a stale two-sided quote in a fast market is catastrophic — you'll be picked off on every series simultaneously.

Cancel-on-Disconnect (COD): the exchange auto-pulls all of an MPID's resting quotes the instant the session drops
Kill switch / purge ports: one message cancels all quotes/orders for the firm
Quote risk monitors / "rapid-fire" / curtain controls (e.g., Cboe): if N contracts or M series execute within a time window, the exchange auto-cancels the remaining quotes
Exchange-side mass cancel complements desk-side limit monitoring — defense in depth

7.6 Combinations: listed strategies vs. synthetics

Multi-leg strategies (spreads, straddles, butterflies, condors) trade two ways:

Exchange-listed complex orders: the strategy is a single tradable instrument with an implied-order engine that crosses the complex order against individual leg books. Guarantees atomic, leg-risk-free execution.
Synthetic: leg into the strategy by sending individual leg orders, accepting leg risk (one leg fills, others move).

8. Numerical Methods and Quantitative Library Internals

8.1 The method hierarchy by model

European, char. fn. known  → Fourier methods (Carr-Madan FFT / COS)   ← fastest
European, Black-Scholes     → closed-form                              ← trivial
Path-dependent / exotic     → Monte Carlo (+ QMC, +AAD for Greeks)
1D local-vol / barrier      → PDE finite difference (Crank-Nicolson)
2D Heston (stoch-vol)       → ADI finite difference
High-dim basket (>3–4)      → Monte Carlo / sparse grids

8.2 Fourier / transform methods — the workhorses for calibration

When a model's characteristic function is known in closed form (Heston, Bates, VG, CGMY, Merton jump-diffusion — all affine/Lévy models), Fourier methods price European options far faster than PDE or MC. This matters enormously because calibration calls the pricer thousands of times.

Carr-Madan FFT (Carr & Madan, J. Computational Finance, 1999):

The call price as a function of log-strike isn't L¹-integrable. Trick: multiply by a damping factor e^{αk} (α=1.5 for the modified call) to make a damped price that is integrable, transform that, divide back out.
The transform has a closed form in terms of the characteristic function; a single FFT yields prices across a whole grid of strikes simultaneously.

COS method (Fang & Oosterlee, SIAM J. Sci. Comput., 2008):

Reconstructs the density via a Fourier-cosine series, whose coefficients come directly from the characteristic function.
Exponential convergence and linear complexity for smooth densities — typically needs far fewer terms than FFT needs grid points, so usually faster than Carr-Madan for the same accuracy.
Extended to Bermudan, discrete-barrier, and Asian options.

CONV method (Lord, Fang, Bervoets & Oosterlee, 2008): FFT-based convolution for early-exercise products.

These three methods are why a desk can recalibrate a Heston surface in milliseconds rather than minutes.

8.3 PDE finite-difference methods

Space: discretize in log-price on a non-uniform grid concentrating nodes near the strike/barrier.
Time: Crank-Nicolson (2nd-order, A-stable) under-damps high-frequency error from the non-smooth payoff kink at the strike, producing spurious oscillations in Gamma near expiry.
- Fix — Rannacher smoothing (1984; Giles & Carter, 2006): replace the first 1–2 CN steps with fully-implicit backward-Euler half-steps to damp high-frequency modes, then continue with CN. Near-universal production practice.
2D PDEs (Heston): use ADI (Alternating Direction Implicit) — split each timestep into directional sweeps, each requiring only tridiagonal (Thomas-algorithm) solves. Standard family (in 't Hout & Foulon, 2010): Douglas, Craig-Sneyd, Modified Craig-Sneyd, Hundsdorfer-Verwer (HV) schemes.

8.4 Calibration optimizers

Optimizer	Type	Character
Levenberg-Marquardt	Local, gradient	Default for least-squares fits. Exploits closed-form gradients ∂C/∂θ from the Fourier pricing integral — the standard fast Heston calibration (Cui et al. 2017).
L-BFGS-B	Local, quasi-Newton, box-constrained	Cheap memory; good when parameters are bounded.
Differential Evolution	Global, population, derivative-free	Robust to bad starting points; used to seed a local refiner.
CMA-ES	Global, evolution-strategy	Strong on rugged/multimodal landscapes; expensive.
Neural / amortized calibration	Learned inverse map	Train a network offline to map surface → params; calibration is a forward pass at inference. (Horvath et al. 2021.)

Production pattern: global seed (DE/CMA-ES) → local polish (LM/L-BFGS), with a Fourier method in the inner loop.

8.5 QuantLib internals — the reference open-source architecture

Core separation — Instrument vs. PricingEngine (Strategy pattern):

An Instrument (e.g., VanillaOption) holds the contract (payoff + exercise) but no pricing logic.
A PricingEngine holds the algorithm. The same option priced by AnalyticEuropeanEngine, BinomialVanillaEngine, MCEuropeanEngine, FdBlackScholesVanillaEngine, etc. — just swap the engine.

PricingEngine
 ├── AnalyticEuropeanEngine          (closed-form Black-Scholes)
 ├── BinomialVanillaEngine<T>        (CRR / Jarrow-Rudd / Tian trees)
 ├── MCEuropeanEngine<RNG,S>         (Monte Carlo)
 └── FdBlackScholesVanillaEngine     (finite difference)

Payoff (PlainVanilla, Barrier, Digital...) + Exercise (European/American/Bermudan)

Term structures (market data layer):

TermStructure
 ├── YieldTermStructure       → FlatForward, PiecewiseYieldCurve,
 │                              FittedBondDiscountCurve
 ├── VolatilityTermStructure  → BlackConstantVol, BlackVarianceSurface
 └── DefaultTermStructure     → FlatHazardRate, PiecewiseDefaultCurve

Lazy evaluation + Observer pattern (the recompute engine):

LazyObject caches results and only recomputes when an input changed.
Observer/Observable: a SimpleQuote (e.g., spot) change calls notifyObservers(), which invalidates dependent instruments' caches automatically.
Handle<T> / RelinkableHandle<T>: relink a whole curve under a live book and have everything downstream recompute lazily on next request.

9. Interest Rate Derivatives — Models and Technology

9.1 Short-rate models — analytical vs numerical

Model	SDE	Distribution	Analytics	Notes
Vasicek (1977)	`dr = a(b−r)dt + σ dW`	Gaussian	Closed-form bonds, bond options	Rates can go negative; constant vol; no exact fit to initial curve
Hull-White 1F (1990)	`dr = (θ(t)−a r)dt + σ dW`	Gaussian	Closed-form bonds, caps, swaptions (Jamshidian)	"Extended Vasicek"; `θ(t)` fits initial term structure exactly
Hull-White 2F (G2++)	two correlated OU factors	Gaussian	Semi-closed swaptions	Better for spread/CMS products
Black-Karasinski (1991)	`d ln r = (θ(t)−a ln r)dt + σ dW`	Lognormal	No closed form → tree/PDE only	Rates stay positive; harder to calibrate
CIR (1985)	`dr = a(b−r)dt + σ√r dW`	Non-central χ²	Closed-form bonds	Feller condition `2ab ≥ σ²` keeps `r>0`

Hull-White is the workhorse. Exact fit to the initial discount curve via time-dependent drift θ(t), Gaussian tractability giving closed-form bond options and Jamshidian's swaption decomposition, and easy trinomial-tree construction for Bermudans/callables.

Hull-White trinomial tree (canonical numerical scheme):

Build a symmetric tree for the auxiliary process x with dx = −a x dt + σ dW, x(0)=0. Each node branches up/middle/down with probabilities matching the conditional mean and variance. Near the boundaries the branching geometry switches.
Displace the whole tree by a time-dependent α(t) so the tree reprices the initial discount curve exactly: r = x + α(t), with α solved layer-by-layer by forward induction on Arrow-Debreu prices.

9.2 HJM framework — modeling the whole forward curve

Heath-Jarrow-Morton (HJM, Econometrica 1992) models the instantaneous forward rate f(t,T) directly. The key result: under the risk-neutral measure, the forward-rate drift is fully determined by the volatility structure (the HJM no-arbitrage drift condition):

df(t,T) = σ(t,T) · [∫_t^T σ(t,s) ds] dt  +  σ(t,T) dW(t)

Markovian collapse / Cheyette (quasi-Gaussian) models: choosing a separable volatility σ(t,T) = g(t)·h(T) collapses the dynamics to a low-dimensional Markov system in two state variables (x, y) — HJM-consistent, multi-curve friendly, supports local/stochastic vol extensions, cheap to simulate. The form actually used inside XVA engines.

9.3 LIBOR Market Model (LMM / BGM)

The LMM (Brace-Gatarek-Musiela 1997) models a vector of discrete forward rates L_i(t) as driftless-under-their-own-measure lognormal processes. Not analytically integrable — almost always simulated by Monte Carlo.

Why it dominates exotics: directly calibrates to the market's caplet and swaption vols; lets you specify the full forward-rate correlation matrix and term structure of vol. Standard engine for Bermudan swaptions, ratchet/sticky caps and floors, callable range accruals, and TARNs.

Implementation details that matter:

Calibration: caplets calibrate analytically (each L_i is lognormal under its own forward measure → Black formula). Swaptions use Rebonato's swaption-vol approximation instead of repricing by MC inside the optimizer.
Drift discretization: the predictor-corrector scheme (Hunter-Jäckel-Joshi) reduces bias from the state-dependent drift.
Factor reduction: PCA the instantaneous correlation matrix down to 2–4 driving factors; the first 3 PCs (level/slope/curvature) capture nearly all variance.

9.4 Interest-rate volatility: SABR and Bachelier

SABR β=0 (normal SABR) is the relevant regime in low/negative-rate markets; the formula is quoted in normal (bp) vol.
Negative rates → shifted SABR & Bachelier. Post-2014, EUR/JPY rates went negative. Industry responses: (1) shifted/displaced SABR — model F + s for a shift s (e.g. +3%); (2) Bachelier (normal) model — dF = σ_N dW, now the standard quoting convention for EUR caps/floors/swaptions.

9.5 The multi-curve framework and SOFR transition

Pre-2008, one LIBOR curve did both forecasting and discounting. The crisis forced the multi-curve paradigm:

Discounting curve = OIS (collateral-funded). On 2020-10-16 LCH and CME switched USD PAI/discounting from Fed Funds (EFFR) to SOFR ("the big bang").
Forecasting curves = one per tenor (1M, 3M, 6M projection curves), because the basis between tenors is now priced.
Circular dependency → a simultaneous multi-curve bootstrap/solver (global Newton over all curve pillars at once). QuantLib does this with GlobalBootstrap.

SOFR-specific numerical headaches:

SOFR is an overnight, backward-looking, daily-compounded rate. Coupons are known only at the end of the period (in-arrears), with conventions like lookback/lockout/payment-delay/observation-shift.
Compounding-in-arrears convexity: the realized compounded rate is a path functional; caps/floors on compounded SOFR need either a term-rate approximation or a model simulating the daily fixings.
SOFR futures convexity: a convexity adjustment (Hull-White-style) is required when bootstrapping the curve off futures.

9.6 XVA — the valuation-adjustment stack

XVA = the family of adjustments to the "clean" derivative price for counterparty credit, funding, margin, and capital. All are functionals of the future exposure profile, computed by Monte Carlo over the whole netting set / portfolio.

Adjustment	What it prices	Core quantity
CVA	expected loss if counterparty defaults	`LGD · ∫ EE(t) · dPD_cpty(t)` discounted
DVA	expected gain if you default (mirror of CVA)	`LGD · ∫ NEE(t) · dPD_self(t)`
FVA	funding cost/benefit of the uncollateralized exposure	exposure × funding spread
MVA	lifetime funding cost of posted initial margin (IM)	`∫ E[IM(t)] · funding spread · DF`
KVA	cost of holding regulatory capital over the trade's life	`∫ E[capital(t)] · cost-of-capital · DF`

The computational engine:

Simulate risk factors (rates, FX, credit, equity) forward on a grid of ~50–200 dates out to 30–50y, typically 5k–100k paths.
Reprice the entire portfolio on every path × every date → the exposure cube V(path, date). This O(paths × dates × trades) repricing is the dominant cost. Path-wise repricing of callable/Bermudan trades needs American Monte Carlo (Longstaff-Schwartz) to get a conditional value — you cannot afford a nested MC per node.
Aggregate to EPE/ENE, apply netting + CSA and integrate against default probabilities.

MVA is the killer. MVA requires projecting Dynamic Initial Margin (DIM) — under ISDA SIMM, IM is a function of portfolio sensitivities (Greeks) at each future date. You must compute Greeks inside the simulation, at every future node, on every path → naively a nested simulation = billions of inner valuations. Production solutions:

Regression / LSAC to approximate future IM from state variables (Green-Kenyon, arXiv 1405.0508)
Deep learning DIM (Hoencamp et al., arXiv 2407.16435, 2024)
AAD for XVA Greeks — computes the full gradient at ~4–10× the cost of one valuation, independent of input count. The enabling technology for real-time counterparty-risk sensitivities.

10. Exotic and Path-Dependent Option Pricing

10.1 Barrier options

Knock-out/knock-in (up-and-out, down-and-in, …). The payoff depends on whether a continuous barrier was touched.

Closed form exists under Black-Scholes (Merton 1973; Reiner-Rubinstein 1991) via the reflection principle — use it whenever the model is plain BS.
PDE: natural fit — the barrier is just a Dirichlet boundary condition (V=0 or rebate on the barrier).
Monte Carlo + Brownian-bridge correction: the central trap is discrete monitoring bias — naive MC under-counts knock-outs and over-prices KO options. Fix: between each pair of nodes, compute the probability the barrier was breached within the step using the known distribution of the bridging Brownian motion maximum/minimum. Lifts convergence order of the hitting-time error from O(√Δt) to O(Δt).

10.2 Asian options

Payoff on the average price over a window (arithmetic or geometric).

Geometric-average Asian = closed form (lognormal of a sum of lognormals in log space → still lognormal; Kemna-Vorst 1990).
Arithmetic-average Asian = no closed form (sum of lognormals isn't lognormal). Best method: Monte Carlo with the geometric Asian as a control variate — the geometric price is known exactly and is highly correlated with the arithmetic payoff, slashing variance by 1–2 orders of magnitude.

10.3 American / Bermudan options — early exercise

The hard problem: at each date, holder chooses exercise vs continue, so you need the continuation value E[ V(t+Δ) | F_t ] everywhere.

Longstaff-Schwartz (LSM / LSMC), RFS 2001 — the dominant method. Backward induction on simulated paths: at each exercise date, regress the discounted future cashflows on a basis of functions of the current state (orthogonal bases — Laguerre/Hermite/Chebyshev); exercise where immediate payoff > continuation. Makes early exercise tractable in high dimension. GPU: <10 ms on a Tesla K20X for 200k paths × 50 steps.
Barone-Adesi-Whaley (1987) quadratic approximation: fast closed-form-ish American price for vanilla calls/puts; great for screening / mass repricing.
Binomial/trinomial trees (Cox-Ross-Rubinstein 1979): exact early-exercise handling by backward induction; O(N²) nodes; smoothing (Broadie-Detemple) and Richardson extrapolation improve convergence.
PDE with free boundary: American = a linear complementarity problem (LCP). Solve with PSOR (Projected SOR) or Brennan-Schwartz on the finite-difference grid. Best Greeks of any method.

10.4 Variance and volatility swaps

Variance swap — model-free static replication. The fair variance strike equals the price of a log contract, replicated by a strip of OTM calls and puts weighted ∝ 1/K²:
```
K_var² ≈ (2/T) [ ∫₀^F P(K)/K² dK + ∫_F^∞ C(K)/K² dK ] · (e^{rT})  −  correction
```
Carr-Madan (1998) is the canonical reference; Demeterfi-Derman-Kamal-Zou (Goldman Sachs, 1999) is the trader's bible. This same 1/K² portfolio is exactly the CBOE VIX construction (2003 methodology).
Volatility swap — no clean model-free replication. vol = √var, and by Jensen's inequality E[√var] < √E[var], so a vol swap is below the square root of the variance strike by a convexity correction that depends on vol-of-vol.

10.5 Autocallables and structured products

Autocallables embed early-termination triggers: on each observation date, if the underlying is above an autocall barrier, the note redeems early with a coupon.

Monte Carlo with early-termination logic is standard: simulate paths under local-vol or LSV, check trigger conditions on each observation date.
Greeks are notoriously unstable near the barriers (discontinuous payoff at the autocall level) → use payoff smoothing, AAD or likelihood-ratio/Malliavin methods, and the conditional-expectation trick at barriers.
Vega/correlation risk dominates the book; dealers run large LSV calibrations nightly.

11. Machine Learning in Pricing and Volatility Modeling

ML in derivatives splits into two distinct goals: (a) surrogate/acceleration — learn a fast approximation of a slow but trusted pricer — and (b) model replacement — learn the hedge or the dynamics directly from data.

11.1 Deep Hedging (Buehler, Gonon, Teichmann, Wood — JPMorgan, Quantitative Finance 2019)

The landmark. Reframe hedging as a direct optimization of a terminal-wealth objective (a convex risk measure / CVaR utility) over a neural-network hedging policy, trained by gradient descent on simulated paths — no analytic Greeks, no model-implied hedge ratios. The network maps current state (price, position, time, signals) → next hedge, and transaction costs and market frictions are baked directly into the objective.

Essentially policy-gradient RL (the simplest instance is Monte-Carlo policy gradient over the path).
Architecture refinements: the No-Transaction-Band Network (Imaki et al., arXiv 2103.01775) bakes the known no-trade-band structure into the net for sample efficiency.
Open source: pfhedge (PyTorch deep-hedging library).

11.2 Neural SDEs

Parameterize the drift and diffusion of an SDE by neural networks and fit the SDE to market prices by backpropagating through an SDE solver (adjoint sensitivity for SDEs, à la Neural ODEs). Gives a flexible, arbitrage-aware generative model of the underlying that can be simulated for exotic pricing while matching the vanilla smile.

11.3 Differential Machine Learning (Huge & Savine, arXiv 2005.02347, 2020)

A surrogate-training breakthrough from Danske Bank. Train a neural network to approximate a pricer on both prices and their pathwise differentials (Greeks) simultaneously — the AAD-computed pathwise derivatives become additional training labels, and a differential regularization term forces the network's gradient to match the AAD gradient. Result: dramatically better sample efficiency and accurate network Greeks for free. This is the standard recipe for fast, accurate pricing/risk surrogates and for the regression step inside LSM-style XVA.

11.4 Surrogate models / emulators

Train a NN (or Gaussian process) to emulate a slow pricer — Heston/rough-Heston MC, LSV exotic prices, the SABR/Bergomi smile map — so calibration and real-time risk become a single forward pass.

Deep calibration: Horvath, Muguruza, Tomas, "Deep Learning Volatility" (Quantitative Finance 2021) — a NN learns the map (model params → implied-vol surface) for rough volatility models, turning otherwise-intractable rough-Bergomi calibration into milliseconds.

11.5 Physics-Informed Neural Networks (PINNs) for pricing PDEs

Solve Black-Scholes / Heston / fractional-BS PDEs by training a NN whose loss is the PDE residual + boundary/terminal-condition penalties (Raissi-Perdikaris-Karniadakis 2019 framework applied to finance). Mesh-free, naturally high-dimensional, gives a smooth global surrogate with autodiff Greeks. American options handled via obstacle/LCP relaxation. Caveats: training is finicky, accuracy on steep payoffs lags finite differences.

Practical verdict: the winning, deployed uses are surrogate acceleration (deep calibration, differential ML for XVA regression) and deep hedging for cost-aware hedging. PINNs and pure neural-SDE pricing are promising but not yet displacing finite-difference/MC on production exotic desks.

12. Options-Specific Market Microstructure

Options microstructure is not equities microstructure: ~2 million listed option series vs ~8,000 stocks/ETPs, quote-driven market making (vs order-driven equities), no off-exchange/dark trading (all options print on-exchange), and a multi-leg, nonlinear risk surface.

12.1 Adverse selection is structurally different

In equities, informed flow is directional (someone knows the stock will move). In options there is a large "natural" hedging flow that is not information-driven: corporates and asset managers buying downside puts, overwriters systematically selling calls, vol-control and risk-parity funds rolling exposure. A market maker's adverse-selection problem is therefore about distinguishing toxic informed vol/direction flow from benign hedging flow — and pricing the vega/gamma the trade adds to inventory, not just the directional risk.

12.2 The market-maker's risk decomposition

A delta-hedged option MM is left holding gamma, vega, and theta:

Gamma scalping (the long-gamma P&L identity). Per unit time the hedging P&L is approximately
```
dP&L ≈ ½ · Γ · S² · (σ²_realized − σ²_implied) · dt
```
Long gamma + delta-hedge ⇒ you "buy low, sell high" on each rehedge and profit if the stock moves more than implied priced in. Rehedging frequency trades off discretization error vs transaction costs/slippage.
Vega risk. Net exposure to the level of implied vol. MMs manage net vega across strikes and expiries, hedging with other options since the underlying has zero vega.

12.3 The volatility risk premium

Implied vol systematically exceeds subsequent realized vol (the VRP): option sellers earn a premium for bearing volatility/jump risk and providing crash insurance. This is why a delta-hedged short-option book is, on average, profitable — and why it blows up in tail events (short gamma + short vega convexity). MMs position for the VRP but must size for the fat left tail; the premium is compensation, not free money.

12.4 0DTE and dealer gamma feedback

The explosion of 0-days-to-expiry options (SPX 0DTE ≈ 50–60% of SPX option volume by 2024–2025) sharpened the dealer-hedging feedback channel. When market makers are net short gamma, delta-rehedging is destabilizing — they buy as the market rises and sell as it falls, amplifying intraday moves; net long gamma damps moves (mean-reversion). Practitioner analytics (SpotGamma, MenthorQ) estimate aggregate Gamma Exposure (GEX) from OPRA open interest.

12.5 Options-specific Reg NMS / linkage

OPRA is the single SIP consolidating last-sale + NBBO + per-exchange quotes across all ~18 US options exchanges — a serious feed-handler and bandwidth engineering problem.
Trade-through / linkage. The Options Order Protection and Locked/Crossed Markets Plan prohibits trading through a better-priced protected quote on another options exchange, enforced by routing or by Intermarket Sweep Orders (ISOs) that simultaneously clear all better-priced protected quotes.
Structural contrasts: options are quote-driven, fully lit (no ATS/dark prints), have payment-for-order-flow and price-improvement auctions (e.g., complex-order/PRIME/AIM auctions), and a vastly larger symbol space.

13. Key References

Pricing models

Black, Scholes, "The Pricing of Options and Corporate Liabilities," J. Political Economy, 1973.
Abramowitz & Stegun, Handbook of Mathematical Functions, 1964 (eq. 26.2.17, normal CDF).
Hart, Computer Approximations, 1968; Cody, "Rational Chebyshev approximation for the error function," Math. Comp., 1969.
Jäckel, "Let's Be Rational," Wilmott, 2015; "By Implication," 2006 — fast IV inversion.
Heston, "A Closed-Form Solution for Options with Stochastic Volatility," Rev. Financial Studies, 1993.
Carr, Madan, "Option Valuation Using the Fast Fourier Transform," J. Computational Finance, 1999.
Lewis, "A Simple Option Formula for General Jump-Diffusion and Other Exponential Lévy Processes," 2001.
Fang, Oosterlee, "A Novel Pricing Method... Based on Fourier-Cosine Series Expansions (COS)," SIAM J. Sci. Comput., 2008.
Lord, Fang, Bervoets & Oosterlee, "A fast and accurate FFT-based method for pricing early-exercise options" (CONV), SIAM J. Sci. Comput., 2008.
Cui, del Baño Rollin, Germano, Ortiz-Gracia, "Full and fast calibration of the Heston stochastic volatility model," EJOR, 2017 (arXiv:1511.08718).
Albrecher et al., "The little Heston trap," Wilmott, 2007.
Hagan, Kumar, Lesniewski, Woodward, "Managing Smile Risk," Wilmott, 2002; "Arbitrage-free SABR," Wilmott, 2014.
Dupire, "Pricing with a Smile," Risk, 1994.
Glasserman, Monte Carlo Methods in Financial Engineering, Springer, 2003.

Greeks / AAD

Giles, Glasserman, "Smoking Adjoints: fast Monte Carlo Greeks," Risk, 2006.
Capriotti, "Fast Greeks by Algorithmic Differentiation," J. Computational Finance, 2011.
Capriotti & Giles, "Fast Correlation Greeks by Adjoint Algorithmic Differentiation," arXiv:1004.1855, 2010.
Capriotti et al., "15 years of Adjoint Algorithmic Differentiation (AAD) in finance," Quantitative Finance 24(9), 2024.
Capriotti, Jiang, Macrina, "AAD and least-square Monte Carlo: Fast Bermudan-style options and XVA Greeks," Algorithmic Finance, 2017.

Volatility surface

Gatheral, The Volatility Surface, Wiley, 2006.
Gatheral, Jacquier, "Arbitrage-free SVI volatility surfaces," Quantitative Finance, 2014 (arXiv:1204.0646).
Corbetta et al. / Hendriks, Martini, eSSVI (arXiv:2204.00312, 1804.04924).

Market making

Avellaneda, Stoikov, "High-frequency trading in a limit order book," Quantitative Finance, 2008.
Stoikov, Saglam, "Option market making under inventory risk," Rev. Derivatives Research, 2009.
Databento, "Beyond 40 Gbps: Processing OPRA in real-time," 2024.
Pico, "OPRA 96-line Expansion," 2024; Optiver engineering blog ("FPGA Hardware at Optiver").

GPU / FPGA acceleration

"The Role of FPGAs in Modern Option Pricing Techniques: A Survey," Electronics (MDPI) 13(16):3186, 2024.
"Pricing American options with LSM on GPUs," 6th Workshop on High Performance Computational Finance, 2013.
NVIDIA GPU Gems 2, Ch. 45, "Options Pricing on the GPU," 2005.
AMD/Xilinx, Vitis Quantitative Finance Library.

Risk systems

"Estimating risks of option books using neural-SDE market models," arXiv:2202.07148, 2022.
CME Group, SPAN documentation; OCC, STANS methodology documentation.

Numerical methods / calibration

in 't Hout & Foulon, "ADI finite difference schemes for option pricing in the Heston model with correlation," Int. J. Numer. Anal. Model., 2010 (arXiv:0811.3427).
Giles & Carter, "Convergence analysis of Crank-Nicolson and Rannacher time-marching," J. Computational Finance, 2006.

Interest-rate models and curves

Heath, Jarrow, Morton, "Bond Pricing and the Term Structure of Interest Rates," Econometrica 1992.
Hull, White, "Pricing Interest-Rate-Derivative Securities," Review of Financial Studies 1990; "Numerical Procedures for Implementing Term Structure Models I & II," J. Derivatives 1994.
Brace, Gatarek, Musiela, "The Market Model of Interest Rate Dynamics," Mathematical Finance 1997.
Cheyette, "Markov Representation of the Heath-Jarrow-Morton Model," 1992.
Rebonato, McKay, White, The SABR/LIBOR Market Model, Wiley 2009.
Lyashenko, Mercurio, "Looking Forward to Backward-Looking Rates" (RFR/SOFR forward-market model), 2019.

XVA / exposure

Green, Kenyon, "MVA: Initial Margin Valuation Adjustment by Replication and Regression," arXiv 1405.0508, 2014.
Abbas-Turki, Crépey et al., "XVA Principles, Nested Monte Carlo Strategies, and GPU Optimizations," IJTAF 2018.
Hoencamp et al., "On Deep Learning for computing the Dynamic Initial Margin and MVA," arXiv 2407.16435, 2024.
Gregory, The xVA Challenge (Wiley); Green, XVA (Wiley); Andersen-Piterbarg, Interest Rate Modeling (3 vols).

Exotics / numerics

Longstaff, Schwartz, "Valuing American Options by Simulation: A Simple Least-Squares Approach," Review of Financial Studies 2001.
Barone-Adesi, Whaley, "Efficient Analytic Approximation of American Option Values," J. Finance 1987.
Cox, Ross, Rubinstein, "Option Pricing: A Simplified Approach," J. Financial Economics 1979.
Kemna, Vorst, "A Pricing Method for Options Based on Average Asset Values," J. Banking & Finance 1990.
Carr, Madan, "Towards a Theory of Volatility Trading," 1998; Demeterfi, Derman, Kamal, Zou, "More Than You Ever Wanted to Know About Volatility Swaps," Goldman Sachs, 1999.

Machine learning

Buehler, Gonon, Teichmann, Wood, "Deep Hedging," Quantitative Finance 2019.
Huge, Savine, "Differential Machine Learning," arXiv 2005.02347, 2020.
Horvath, Muguruza, Tomas, "Deep Learning Volatility," Quantitative Finance 2021.
Raissi, Perdikaris, Karniadakis, "Physics-Informed Neural Networks," J. Computational Physics 2019.
Imaki et al., "No-Transaction Band Network," arXiv 2103.01775, 2021.

Microstructure

Dim, Eraker, Vilkov, "0DTEs: Trading, Gamma Risk and Volatility Propagation," SSRN, 2024.
Barbon, Buraschi, "Gamma Fragility" (dealer gamma feedback).
Bakshi, Kapadia, "Delta-Hedged Gains and the Negative Market Volatility Risk Premium," RFS 2003.

Libraries / systems

QuantLib (C++/Python); ORE / Open Source Risk Engine — exposure & XVA; pfhedge — deep hedging; CoDiPack / Adept / dco/c++ — AAD. Commercial: Murex MX.3, Numerix, FIS Adaptiv, Quantifi, S&P Global xVA, Bloomberg MARS.
QuantLib architecture guide — risk-quant-haun.github.io/quantlib/architecture; Ballabio, Implementing QuantLib.

See also: hardware/low_latency_trading.md — OPRA feed handling, Reg NMS/SOR, FPGA pricing, tick-to-trade latency; the execution layer beneath this pricing/risk layer.