Glossary

Acronyms and technical terms used across research docs.

Term	Definition	Covered In
BTB	Branch Target Buffer — hardware cache mapping instruction PCs to predicted branch targets; hierarchical (L0 µBTB / L1 / L2)	Superscalar OoO CPU
ACS	Access Control Services — PCIe capability ensuring peer-to-peer isolation for IOMMU groups	VFIO Internals
AES-NI	Advanced Encryption Standard New Instructions — x86 hardware-accelerated AES encryption	ISA Critical Instructions
AMQP	Advanced Message Queuing Protocol — wire-level messaging protocol implemented by RabbitMQ	RabbitMQ Internals
AQE	Adaptive Query Execution — Spark 3.0+ runtime re-optimization based on actual data statistics	Database Systems
ARIES	Algorithm for Recovery and Isolation Exploiting Semantics — foundational WAL recovery protocol using LSN-based redo/undo	WAL & Torn Pages
ART	Adaptive Radix Tree — cache-friendly trie with variable node sizes (Node4/16/48/256), developed at TUM	Database Systems
AVX-512	Advanced Vector Extensions 512-bit — x86 SIMD instruction set processing 16 floats per instruction	ISA Critical Instructions
BAR	Base Address Register — PCI configuration space register defining device memory-mapped I/O regions	VFIO Internals
BFT	Byzantine Fault Tolerance — ability to reach consensus despite arbitrary (malicious) node failures	Distributed Consensus
BMI	Bit Manipulation Instructions — x86 extension for PEXT/PDEP/BLSR and other bit-level operations	ISA Critical Instructions
BNLJ	Block Nested Loop Join — join variant reading outer relation in memory-sized blocks to reduce I/O	Join Algorithms
BRIN	Block Range Index — PostgreSQL lightweight index storing min/max per physical block range	Database Systems
BTI	Branch Target Identification — ARM control-flow integrity mechanism marking valid indirect branch targets	ISA Critical Instructions
Bw-Tree	Lock-free B+ tree using delta records and CAS operations, developed for Hekaton/Azure Cosmos DB	Data Structures
CAP	Consistency, Availability, Partition tolerance — Brewer/Gilbert-Lynch theorem on distributed system trade-offs	Distributed Consensus
CAS	Compare-And-Swap — atomic instruction for lock-free programming; CMPXCHG on x86, CAS/LDXR-STXR on ARM	ISA Critical Instructions
CBO	Cost-Based Optimization — query optimization using table/column statistics for plan selection	Database Systems
CMS	Count-Min Sketch (Cormode & Muthukrishnan 2005) — d×w counter matrix for streaming frequency estimation with O(ε⁻¹ log δ⁻¹) space	Database Statistics
CDC	Change Data Capture — technique for streaming database changes as events, often via Debezium or logical replication	WAL Incremental Conversion
CDNA	Compute DNA — AMD's compute-focused GPU microarchitecture (MI-series accelerators)	GPU/TPU Accelerator Design
CET	Control-flow Enforcement Technology — Intel's shadow stack + ENDBR indirect branch tracking for CFI	ISA Critical Instructions
CLOG	Commit Log — PostgreSQL structure tracking transaction commit/abort status for MVCC visibility	Arrow PostgreSQL Integration
Coordinated omission	Benchmarking error (Tene) where the load generator stops sending during a stall, hiding the worst latencies; corrected by recording intended (not actual) send time	Low-Latency Trading
COW	Copy-On-Write — technique where data is shared until modified; used in WiredTiger B-trees, Neon branching, btrfs	MongoDB/WiredTiger Internals
C-state	CPU idle power state (C0 active … C6 deep sleep); deep states have µs wakeup latency, a top HFT jitter source; capped via idle=poll / processor.max_cstate=1 / PM-QoS	Low-Latency Trading
CoWoS	Chip-on-Wafer-on-Substrate — TSMC advanced packaging for multi-die integration (used in H100/A100)	GPU/TPU Accelerator Design
CPL	Consistency Point LSN — Aurora's highest LSN representing a transaction-consistent boundary	Disaggregated Storage
CQE	Completion Queue Entry — io_uring kernel-to-user result structure for completed I/O operations	io_uring Internals
CRDT	Conflict-free Replicated Data Type — data structure achieving eventual consistency without coordination	Distributed Consensus
Calcite	Apache Calcite — embeddable Java SQL parser + validator + relational-algebra optimizer (no storage/execution engine); powers Flink, Hive, Druid, Kylin, Beam, Phoenix; Begoli SIGMOD 2018	Calcite Internals
Cascades	Top-down memoizing query-optimization framework (Graefe 1995) using rule-driven exploration of a memo of equivalence groups with branch-and-bound; basis of Calcite's VolcanoPlanner and CockroachDB's optimizer	Calcite Internals, CockroachDB Optimizer Rules
Convention	Calcite trait identifying the execution engine / calling convention of a RelNode (NONE = logical, ENUMERABLE = built-in linq4j, BINDABLE = interpreted, JDBC/Druid/etc.); inputs must match or be bridged by a converter	Calcite Internals
RelNode	Calcite relational expression (Project/Filter/Join/Aggregate/Scan…) carrying a RelTraitSet; logical forms in rel.logical, physical forms per convention; memoized by digest in the VolcanoPlanner	Calcite Internals
RexNode	Calcite row/scalar expression (RexInputRef `$n` by ordinal, RexLiteral, RexCall, RexOver window, RexSubQuery, RexCorrelVariable); built by RexBuilder, simplified by RexSimplify	Calcite Internals
RelSubset	A RelSet (equivalence class of semantically identical RelNodes) restricted to one RelTraitSet; caches the cheapest member (`best`/`bestCost`); the unit of Calcite's Volcano memo and cost propagation	Calcite Internals
RexUnknownAs	Calcite three-valued-logic mode (FALSE/TRUE/UNKNOWN) telling RexSimplify how NULL may be folded in a given syntactic position (e.g. UNKNOWN→FALSE under WHERE)	Calcite Internals
CXL	Compute Express Link — cache-coherent interconnect for CPU-to-device memory sharing over PCIe physical layer	Disaggregated Storage
DBSP	Database Stream Processor — formal mathematical framework for incremental computation over Z-sets (Feldera)	Database Systems
DDMTD	Digital Dual-Mixer Time Difference — sub-picosecond phase-measurement technique used by White Rabbit for <1 ns clock sync	Low-Latency Trading
Disruptor	LMAX's lock-free, allocation-free ring-buffer inter-thread pipeline (sequence-claimed entries, padded cursors, busy-spin); the canonical single-thread trading pipeline pattern	Low-Latency Trading, Data Structures
DMA	Direct Memory Access — hardware capability for devices to read/write main memory without CPU involvement	VFIO Internals
DMB	Data Memory Barrier — ARM instruction ordering memory accesses without stalling execution	ISA Critical Instructions
DecodedVector	Velox helper that normalizes any vector encoding (DICT/CONST/BIAS/SEQ) to flat base + indices for safe element access	Velox Internals
DPccp	Dynamic Programming connected complement pairs — join enumeration algorithm for bushy plans	Join Algorithms
DPhyp	Dynamic Programming on hypergraphs — join enumeration supporting multi-relation predicates	Join Algorithms, DuckDB Internals
DPDK	Data Plane Development Kit — userspace networking framework using VFIO for kernel-bypass packet processing (~30-40 Mpps)	VFIO Internals, Low-Latency Trading
DSB	Data Synchronization Barrier — ARM instruction that stalls execution until all prior memory accesses complete	ISA Critical Instructions
DST	Deterministic Simulation Testing — technique running distributed systems in a single-threaded deterministic simulator	Deterministic Simulation Testing
EBR	Epoch-Based Reclamation — memory reclamation scheme for lock-free data structures using global epoch tracking	Data Structures
ef_vi	Solarflare/AMD low-level layer-2 NIC API giving user space direct access to RX/TX descriptor rings + DMA buffers (poll via ef_eventq_poll); ~250 ns NIC-to-user; the HFT kernel-bypass data path	Low-Latency Trading
EPT	Extended Page Tables — Intel VT-x hardware-assisted two-level address translation for VM memory	ISA Critical Instructions
EVEX	Extended VEX — x86 instruction prefix encoding for AVX-512/APX supporting 32 vector registers and masking	ISA Critical Instructions
FDW	Foreign Data Wrapper — PostgreSQL mechanism for querying external data sources as local tables	Database Systems
FLP	Fischer-Lynch-Paterson impossibility — proof that deterministic asynchronous consensus is impossible with even one crash	Distributed Consensus
FMA	Fused Multiply-Add — single instruction computing a*b+c with one rounding; used in Tensor Cores and CPU SIMD	GPU/TPU Accelerator Design
FOR	Frame-of-Reference — lightweight compression storing a per-segment base (min) and bitpacked offsets; common for dates/timestamps	DuckDB Internals
FSST	Fast Static Symbol Table — string compression assigning 1-byte codes to frequent substrings, allowing random access without full decompression	DuckDB Internals
FP8	8-bit floating point — low-precision format (E4M3/E5M2) for LLM training/inference on Hopper+ GPUs	GPU/TPU Accelerator Design
FPGA NIC	Network card with on-board FPGA fabric (Alveo UL3524/X3/ExaNIC) that parses market data and fires orders in a fixed-latency dataflow pipeline; HFT tick-to-trade 30–100 ns (STAC-T0 record 13.9 ns, Exegy/AMD 2024) with near-zero jitter (~200 ps)	Low-Latency Trading
FPW	Full Page Writes — PostgreSQL technique writing complete page images to WAL after checkpoint to prevent torn pages	WAL & Torn Pages
FTL	Flash Translation Layer — SSD firmware mapping logical block addresses to physical NAND pages	WAL & Torn Pages
GAA	Gate-All-Around — transistor architecture (replacing FinFET at 2nm) where gate wraps channel on all sides	GPU/TPU Accelerator Design
GDSII	Graphic Data System II — standard file format for IC layout data sent to semiconductor foundries for fabrication	GPU/TPU Accelerator Design
GEQO	Genetic Query Optimizer — PostgreSQL's join ordering strategy using genetic algorithms for queries with >= 12 tables	Join Algorithms
GIN	Generalized Inverted Index — PostgreSQL index for composite values (arrays, JSONB, full-text tsvector)	Database Systems
GiST	Generalized Search Tree — PostgreSQL extensible indexing framework for complex data types (geometric, range)	Database Systems
GOO	Greedy Operator Ordering — O(n^3) heuristic join ordering algorithm used as fallback for large queries	Join Algorithms
GST	Global Stabilization Time — the (unknown) point after which network timing bounds hold in partial synchrony models	Distributed Consensus
GTID	Global Transaction Identifier — MySQL identifier simplifying replication topology management and failover	Database Systems
HAMT	Hash Array Mapped Trie — persistent data structure with near-O(1) operations via structural sharing (Clojure, Scala)	Data Structures
HashStringAllocator	Velox arena allocator used inside hash tables and aggregation: 4-byte Header per block (kFree/kContinued/kPreviousFree flags) + CompactDoubleList free list; streaming write via newWrite/finishWrite	Velox Internals
HBM	High Bandwidth Memory — stacked DRAM (HBM2/HBM3) providing >1 TB/s bandwidth for GPU/accelerator designs	GPU/TPU Accelerator Design
HLC	Hybrid Logical Clock — clock combining wall-clock time with a logical counter for causal ordering (CockroachDB)	Database Systems
HLL	HyperLogLog (Flajolet et al. 2007) — probabilistic NDV sketch: 2^b registers, harmonic mean estimate, 1.04/√(2^b) relative error; merges via elementwise max	Database Statistics, Data Structures
HOT	Heap-Only Tuple — PostgreSQL optimization where updated tuples stay on the same page, avoiding index updates	Arrow PostgreSQL Integration
HTAP	Hybrid Transactional/Analytical Processing — system handling both OLTP and OLAP workloads (HyPer, TiDB)	HyPer/Umbra/CedarDB
IOMMU	I/O Memory Management Unit — hardware translating device DMA addresses (IOVA) to physical addresses	VFIO Internals
IOMMUFD	IOMMU File Descriptor — newer Linux interface replacing VFIO container/group model with fd-centric API	VFIO Internals
IOTLB	I/O Translation Lookaside Buffer — IOMMU's cache for IOVA-to-PA translations; hugepages reduce miss rate dramatically	VFIO Internals
IOVA	I/O Virtual Address — the address space a device sees through the IOMMU, analogous to virtual addresses for CPUs	VFIO Internals
ILP	Instruction-Level Parallelism — overlap of independent instructions from a single thread exploited by OoO execution	Superscalar OoO CPU
IPC	Instructions Per Cycle — microarchitectural efficiency metric; PrediCache achieves 0.55 IPC vs 0.31 for traditional	Buffer Management
IQ	Issue Queue / Reservation Stations — buffer holding dispatched µops waiting for operands before execution; unified or distributed	Superscalar OoO CPU
isolcpus	Linux boot param removing CPUs from the scheduler's load balancing so only explicitly pinned threads run there; HFT core-shielding primitive (paired with nohz_full/rcu_nocbs)	Low-Latency Trading
ITCH	Nasdaq's binary market-data protocol (fixed-length, big-endian, sequenced over MoldUDP64); add/execute/cancel/delete order messages, designed for branch-light parsing	Low-Latency Trading
ISR	In-Sync Replicas — Kafka replicas caught up with the partition leader, eligible for leader election	Kafka Internals
Janino	Lightweight in-process Java source-to-bytecode compiler; Calcite uses it to compile generated linq4j Enumerable code, RexExecutor constant folding, and the metadata-handler dispatcher	Calcite Internals
JoinBridge	Velox synchronization primitive between HashBuild and HashProbe pipelines: build sets a folly::Promise with the completed HashTable; probe returns a folly::SemiFuture from isBlocked() until build completes	Velox Internals
JIT	Just-In-Time compilation — compiling code at runtime; used by HyPer (LLVM), Umbra (asmJIT), PostgreSQL 11+	HyPer/Umbra/CedarDB
JOB	Join Order Benchmark (Leis et al. VLDB 2015) — 113 IMDB queries, 3–16-way joins; standard benchmark for cardinality estimation accuracy (Q-error)	Database Statistics
KPTI	Kernel Page Table Isolation — Meltdown mitigation separating user/kernel page tables; adds ~5-30% overhead on syscall-heavy workloads	Superscalar OoO CPU
KLL	KLL Sketch (Karnin-Lang-Liberty 2016) — near-optimal mergeable quantile sketch; O(ε⁻¹ log log 1/δ) space; better than GK sketch for distributed merge	Database Statistics, Data Structures
KMV	K-Minimum Values sketch — maintains k smallest hash values; NDV ≈ (k-1)/max(kth_smallest); merges by union + take k smallest	Database Statistics
KRaft	Kafka Raft — Kafka's built-in Raft-based consensus replacing ZooKeeper for metadata management	Kafka Internals
kTLS	Kernel TLS — Linux kernel offload of TLS encryption/decryption for socket I/O, reducing context switches	Linux Expert Syscalls
LDAR	Load-Acquire Register — ARM instruction providing acquire semantics (no subsequent access reordered before it)	ISA Critical Instructions
LDAPR	Load-Acquire RCpc Register — ARM weaker acquire load (ARMv8.3-RCPC) matching C++ memory_order_consume-like behavior	ISA Critical Instructions
LIPAH	Logical-ID Pointer Augmented Hinting — buffer manager using fat pointers (PID + hint address); limited to 32-bit PIDs	Buffer Management
LL/SC	Load-Linked/Store-Conditional — ARM/RISC-V atomic primitive pair (LDXR/STXR, LR/SC) for lock-free operations	ISA Critical Instructions
LMUL	Length Multiplier — RISC-V Vector extension register grouping factor controlling effective vector length	ISA Critical Instructions
LQ	Load Queue — per-core buffer tracking all in-flight loads for STLF lookup and memory ordering violation detection	Superscalar OoO CPU
LPS	Log Processing Service — AlloyDB component that receives WAL and materializes data blocks asynchronously	Disaggregated Storage
LSE	Large System Extensions — ARMv8.1 atomic instructions (CAS, LDADD, SWP) replacing LL/SC for better scalability	ISA Critical Instructions
Lattice	Calcite OLAP construct modeling a star/snowflake schema as a virtual fact-table join; dimensions and measures define candidate aggregate "tiles" auto-selected via the HRU (Harinarayan SIGMOD 1996) cube-lattice greedy algorithm for materialized-view acceleration	Calcite Internals
linq4j	Calcite's Java port of .NET LINQ — Enumerable/Enumerator data model plus an expression-tree AST (org.apache.calcite.linq4j.tree) that the Enumerable convention emits and Janino compiles	Calcite Internals
LSM	Log-Structured Merge tree — write-optimized structure converting random writes to sequential via leveled compaction	LSM Trees
LSN	Log Sequence Number — monotonically increasing identifier for WAL records, used for recovery and page versioning	WAL & Torn Pages
MergeTree	ClickHouse's core storage engine family: each INSERT writes an immutable PK-sorted part; background merges fold parts together (LSM-like)	ClickHouse Internals
Granule	ClickHouse unit of index addressing — default 8192 rows (capped by index_granularity_bytes); the smallest data block the sparse index can select	ClickHouse Internals
Mark	ClickHouse mark (.mrk3/.cmrk3) — 24-byte record mapping a granule to (offset_in_compressed_file, offset_in_decompressed_block, rows_in_granule)	ClickHouse Internals
Sparse primary index	Index storing the PK tuple only at each granule boundary (primary.idx) — lossy zone-map-style pruning, not per-row	ClickHouse Internals
DoubleDelta	Codec storing second-order differences (delta-of-deltas) with Gorilla varint framing; near-free for fixed-stride sequences like timestamps	ClickHouse Internals
T64	ClickHouse codec transposing 64 integers into bit-planes after range subtraction, storing only the needed planes; for low-range/low-cardinality ints	ClickHouse Internals
Gorilla	XOR-based float compression encoding leading/trailing zero runs of consecutive-value XORs (Pelkonen VLDB 2015); a ClickHouse codec	ClickHouse Internals
Volnitsky	Bigram-hash substring search algorithm (Boyer-Moore-Horspool variant) used in ClickHouse string/LIKE matching	ClickHouse Internals
NuRaft	C++ Raft consensus library underpinning ClickHouse Keeper, the ZooKeeper-compatible coordination service	ClickHouse Internals
Projection	ClickHouse alternate physical layout stored inside each part (different sort order and/or pre-aggregation), auto-maintained through merges	ClickHouse Internals
MESIF	Modified/Exclusive/Shared/Invalid/Forward — Intel's extension of MESI with a Forward state for peer-to-peer cache supply	Superscalar OoO CPU
MESI	Modified/Exclusive/Shared/Invalid — CPU cache coherence protocol tracking cache line states across cores	ISA Critical Instructions
MCV	Most Common Values — per-column list of (value, frequency) pairs stored in pg_statistic stakind=1; used for exact selectivity on high-frequency values	Database Statistics
NDV	Number of Distinct Values — column statistic driving join selectivity (1/max(NDV_R, NDV_S)); estimated via HLL or Haas-Stokes sampler	Database Statistics
MLP	Memory-Level Parallelism — number of simultaneous outstanding cache misses a core can sustain; bounded by ROB size and MSHR count	Superscalar OoO CPU
MOESI	Modified/Owned/Exclusive/Shared/Invalid — AMD's extension of MESI with Owned state for dirty-line sharing without writeback	Superscalar OoO CPU
MPKI	Misses Per Kilo-Instructions — branch or cache miss rate metric; TAGE achieves <3% branch MPKI on SPEC CPU 2006	Superscalar OoO CPU
MSHR	Miss Status Holding Register — tracks outstanding cache misses and coalesces accesses to the same line; count ≈ MLP	Superscalar OoO CPU
MMA	Matrix Multiply-Accumulate — Tensor Core operation computing D = A * B + C on small matrix tiles	GPU/TPU Accelerator Design
MMIO	Memory-Mapped I/O — mapping device registers into CPU address space for direct read/write access	VFIO Internals
Morsel	A small chunk of a source operator's input handed to a worker thread; unit of morsel-driven parallelism and work stealing	DuckDB Internals
MPSM	Massively Parallel Sort-Merge — NUMA-aware join algorithm with local sort + parallel merge across nodes	Join Algorithms
MSI-X	Message Signaled Interrupts Extended — PCIe interrupt delivery via memory writes, supporting per-queue interrupt vectors	VFIO Internals
MTE	Memory Tagging Extension — ARM hardware feature for detecting memory safety bugs (use-after-free, buffer overflow)	Linux Expert Syscalls
MoldUDP64	Nasdaq's lightweight UDP-multicast transport carrying ITCH messages with a sequence number + message-count header for gap detection and A/B line arbitration	Low-Latency Trading
MVCC	Multi-Version Concurrency Control — concurrency scheme where readers see snapshots and writers create new versions	Database Systems, DuckDB Internals
NoC	Network-on-Chip — on-die interconnect (ring/mesh/torus) routing traffic between cores, caches, and memory controllers	Superscalar OoO CPU, GPU/TPU Accelerator Design
nohz_full	Linux full-dynticks boot param stopping the periodic scheduler tick on a CPU running exactly one task; removes the biggest recurring jitter source on an isolated HFT core (needs rcu_nocbs)	Low-Latency Trading
NLJ	Nested Loop Join — simplest join algorithm scanning inner relation for each outer tuple; O(\|R\| * B(S)) I/O	Join Algorithms
NUMA	Non-Uniform Memory Access — multi-socket architecture where memory access latency depends on which socket owns the memory	HyPer/Umbra/CedarDB
NVIC	Nested Vectored Interrupt Controller — ARM Cortex-M interrupt controller with priority-based preemption	Timer Interrupts STM32
NVLink	NVIDIA proprietary high-bandwidth GPU-to-GPU interconnect (NVLink5: 1.8 TB/s bidirectional)	GPU/TPU Accelerator Design
OCC	Optimistic Concurrency Control — transaction scheme allowing concurrent execution, validating at commit time	Disaggregated Storage
OID	Object Identifier — PostgreSQL's internal numeric identifier for database objects (types, relations, functions)	Arrow PostgreSQL Integration
OpenOnload	Solarflare/AMD LD_PRELOAD user-space TCP/IP stack intercepting BSD socket calls for kernel bypass with zero app change; spinning mode busy-polls the NIC (no IRQ/syscall), ~1–3 µs RX	Low-Latency Trading
OLAP	Online Analytical Processing — workload pattern of complex read-heavy aggregation queries (DuckDB, ClickHouse)	Database Systems
OLTP	Online Transaction Processing — workload pattern of high-throughput short read-write transactions (PostgreSQL, MySQL)	Database Systems
OoO	Out-of-Order execution — CPU technique issuing instructions in data-dependency order rather than program order to hide latency	Superscalar OoO CPU
PAC	Pointer Authentication Code — ARM cryptographic signature embedded in pointer unused bits for control-flow integrity	ISA Critical Instructions
PACELC	Partition-Availability-Consistency / Else Latency-Consistency — extension of CAP capturing normal-operation trade-offs	Distributed Consensus
PASID	Process Address Space ID — IOMMU feature enabling per-process DMA address translation for shared virtual addressing	VFIO Internals
PAX	Partition Attributes Across — hybrid row/column page layout storing columns within each page (Umbra)	Database Systems
PBFT	Practical Byzantine Fault Tolerance — first practical BFT protocol tolerating f Byzantine faults with 3f+1 replicas	Distributed Consensus
PEBS	Precise Event-Based Sampling — Intel hardware profiling capturing exact instruction pointer on performance counter overflow	ISA Critical Instructions
PHC	PTP Hardware Clock — the NIC's on-board clock that timestamps packets at the MAC (ns resolution); basis for hardware timestamping and PTP sync (disciplined by ptp4l/phc2sys)	Low-Latency Trading
PRF	Physical Register File — centralized storage for all in-flight register values; separate INT and FP files sized at ROB + arch_regs	Superscalar OoO CPU
PG	Protection Group — Aurora's 10 GB storage segment replicated 6 ways across 3 AZs	Disaggregated Storage
PID	Page ID — logical identifier for a database page, translated to a buffer frame address by the buffer manager	Buffer Management
Pipeline Breaker	Operator that must fully consume its input before producing output (hash-join build, aggregate, sort); materializes into pipeline-local state and acts as the source of a downstream pipeline	DuckDB Internals
PITR	Point-In-Time Recovery — restoring a database to any past moment by replaying WAL to a target LSN/timestamp	Database Systems
PLL	Phase-Locked Loop — clock generation circuit multiplying a reference crystal frequency for the system clock	Timer Interrupts STM32
PMU	Performance Monitoring Unit — hardware counters (cycles, cache misses, branch mispredictions) for CPU profiling	Cycle Counters & Energy
PREEMPT_RT	Linux real-time patchset (mainlined 6.12, 2024) making nearly all kernel code preemptible (sleeping spinlocks, threaded IRQs, priority inheritance); lowers worst-case scheduling latency measured by cyclictest	Low-Latency Trading
PTP	Precision Time Protocol (IEEE 1588) — Ethernet clock sync to sub-µs (tens of ns with hardware timestamping + transparent/boundary clocks); Sync/Follow_Up/Delay_Req/Delay_Resp exchange	Low-Latency Trading
RBPEX	Resilient Buffer Pool Extension — local SSD cache in Azure SQL Hyperscale surviving process restarts	Disaggregated Storage
RowContainer	Velox row-major slab storing group keys and accumulators: fields ordered as (normalized key, null bits, fixed 8-byte slots, variable-width section, accumulators, probed flag)	Velox Internals
Q-error	Cardinality estimation accuracy metric: max(est/actual, actual/est) ≥ 1; Q-error=1 is perfect; JOB benchmark shows PostgreSQL p95 ≈ 12×	Database Statistics
RAS	Return Address Stack — hardware stack that speculatively captures call targets to predict return addresses	Superscalar OoO CPU
RAT	Register Alias Table — maps architectural register names to physical register IDs during OoO rename stage	Superscalar OoO CPU
RCU	Read-Copy-Update — Linux kernel synchronization allowing lock-free reads with deferred reclamation of old data	Data Structures
RDMA	Remote Direct Memory Access — network hardware reading/writing remote memory without CPU involvement (~10 us latency)	Disaggregated Storage
RDTSC	Read Time-Stamp Counter — x86 instruction reading the 64-bit cycle counter; RDTSCP variant serializes prior instructions; LFENCE;RDTSC to fence loads when timing a tight segment	Cycle Counters & Energy, Low-Latency Trading
ROB	Reorder Buffer — circular buffer holding all in-flight µops; enables in-order retirement and precise exception handling	Superscalar OoO CPU
RLE	Run-Length Encoding — compression encoding consecutive identical values as (value, count) pairs	Database Systems
RMI	Recursive Model Index — learned index structure using a hierarchy of ML models to predict key positions	LSM Trees
RTL	Register Transfer Level — hardware description abstraction (Verilog/VHDL) defining logic in terms of registers and operations	GPU/TPU Accelerator Design
RUM	Read, Update, Memory conjecture — states you can optimize at most two of read/write/space overhead in an index	LSM Trees
RVWMO	RISC-V Weak Memory Ordering — RISC-V's relaxed memory model preserving only data dependencies and same-address ordering	ISA Critical Instructions
RVV	RISC-V Vector extension — scalable vector ISA with LMUL register grouping and vector-length agnostic (VLA) programming	ISA Critical Instructions
SQ	Store Queue — buffer holding committed stores until they drain to the L1D cache; used for STLF and memory ordering	Superscalar OoO CPU
SBE	Simple Binary Encoding — FIX's fixed-offset binary message encoding used by CME MDP 3.0 market data; no field parsing branches, decode is memcpy + endian swap	Low-Latency Trading
SCL	Segment Complete LSN — per-Protection-Group completeness tracker in Aurora's storage layer	Disaggregated Storage
seccomp	Secure Computing Mode — Linux syscall filtering mechanism using BPF programs for sandboxing (used in Neon WAL redo)	Linux Expert Syscalls
Selection Vector	Array of indices into a vector selecting surviving rows after a filter; threaded downstream so filtered data is not compacted until materialization (DuckDB vectorized engine)	DuckDB Internals
SelectivityVector	Velox bitmask (uint64_t words, 64 rows/word) of active rows passed between operators and into expression eval; applyToSelected() iterates via __builtin_ctzll	Velox Internals
SharedArbitrator	Velox MemoryArbitrator implementation for global fair memory sharing across queries; 3-pass reclaim: free capacity → spill largest → abort victim	Velox Internals
StringView	Velox 16-byte string representation: [size:4][inline:12] for ≤12 chars, or [size:4][prefix:4][ptr:8] for longer; enables fail-fast comparison and zero-copy substr	Velox Internals
SFU	Special Function Unit — GPU hardware computing transcendentals (sin, cos, rsqrt, log) at reduced throughput	GPU/TPU Accelerator Design
SIMT	Single Instruction, Multiple Thread — GPU execution model where warps of 32 threads execute in lockstep	GPU/TPU Accelerator Design
SM	Streaming Multiprocessor — fundamental GPU compute unit containing CUDA cores, Tensor Cores, register file, and shared memory	GPU/TPU Accelerator Design
SME	Scalable Matrix Extension — ARM extension for matrix operations using a 2D tile register (ZA) for GEMM acceleration	ISA Critical Instructions
SMJ	Sort-Merge Join — join algorithm sorting both relations then merging; optimal when inputs are pre-sorted	Join Algorithms
SMI	System Management Interrupt — firmware/BIOS interrupt invisible to the OS (thermal, USB legacy, ECC scrub); the worst HFT tail-latency spike (tens of µs); mitigated in BIOS, detected via turbostat/PMU	Low-Latency Trading
SMMU	System Memory Management Unit — ARM's IOMMU implementation (SMMUv3) for DMA address translation and device isolation	VFIO Internals
STLF	Store-to-Load Forwarding — hardware mechanism supplying load data directly from the store queue, bypassing cache (~4-5 cycles)	Superscalar OoO CPU
SPDK	Storage Performance Development Kit — userspace NVMe driver framework using VFIO for millions of IOPS per core	VFIO Internals
SPSC	Single Producer Single Consumer — lock-free queue variant with one writer and one reader thread; wait-free with no CAS (acquire/release on head/tail, each on its own cache line); the HFT workhorse queue	Data Structures, Low-Latency Trading
SQE	Submission Queue Entry — io_uring user-to-kernel I/O request structure (opcode, fd, buffer, offset)	io_uring Internals
SQPOLL	Submission Queue Polling — io_uring mode where a kernel thread polls the SQ, eliminating syscalls entirely; with fixed buffers + zero-copy RX (IORING_OP_RECV_ZC) makes io_uring a kernel-bypass-adjacent networking datapath (~4.2 µs p50, between sockets and AF_XDP)	io_uring Internals, Low-Latency Trading
STAC-T0	Securities Technology Analysis Center benchmark for tick-to-trade network-I/O latency (receive trigger → emit order); the public ULL reference. Record 13.9 ns (Exegy/AMD Alveo UL3524, Jun 2024), down from 24.2 ns, jitter ~200 ps	Low-Latency Trading
SR-IOV	Single Root I/O Virtualization — PCIe spec creating lightweight virtual functions from one physical device	VFIO Internals
SSI	Serializable Snapshot Isolation — PostgreSQL's true serializable isolation via predicate locking and conflict detection	Database Systems
SSTable	Sorted String Table — immutable, sorted on-disk file in LSM trees containing key-value pairs with index/bloom filter	LSM Trees
STLR	Store-Release Register — ARM instruction providing release semantics (no preceding access reordered after it)	ISA Critical Instructions
SVA	Shared Virtual Addressing — IOMMU feature letting devices use the same virtual addresses as the CPU process	VFIO Internals
SVE	Scalable Vector Extension — ARM vector ISA with hardware-defined vector length (128-2048 bits) for portable SIMD	ISA Critical Instructions
TAGE	Tagged Geometric History Length Branch Predictor — state-of-the-art predictor using multiple tagged components indexed by geometric history lengths (Seznec 2006)	Superscalar OoO CPU
THP	Transparent Huge Pages — Linux kernel feature automatically promoting 4KB page allocations to 2MB pages to reduce TLB pressure	Superscalar OoO CPU
TF32	TensorFloat-32 — NVIDIA 19-bit format (8-bit exponent, 10-bit mantissa) for Tensor Core GEMM on Ampere+	GPU/TPU Accelerator Design
TLB	Translation Lookaside Buffer — CPU/IOMMU cache for virtual-to-physical address translations	Buffer Management
TOAST	The Oversized-Attribute Storage Technique — PostgreSQL mechanism compressing/storing large field values out-of-line	Database Systems
TrueTime	Google's globally-synchronized clock API returning bounded time intervals using GPS + atomic clocks (Spanner)	Disaggregated Storage
TSC	Time Stamp Counter — x86 hardware counter incrementing at a fixed reference frequency, read via RDTSC/RDTSCP	Cycle Counters & Energy
Tick-to-trade	HFT latency from an inbound market-data update ("tick") to the outbound order it triggers; honestly measured wire-to-wire via an external optical tap; ~0.8–5 µs software (≈<2 µs best with kernel bypass), 30–100 ns pure-FPGA, STAC-T0 record 13.9 ns (2024)	Low-Latency Trading
TSO	Total Store Order — x86 memory model where only Store-Load reordering is permitted; most lock-free code "just works"	ISA Critical Instructions, Low-Latency Trading
TSX	Transactional Synchronization Extensions — Intel hardware transactional memory (XBEGIN/XEND), deprecated due to security issues	ISA Critical Instructions
UCIe	Universal Chiplet Interconnect Express — open standard for die-to-die communication in chiplet-based designs	GPU/TPU Accelerator Design
UIO	Userspace I/O — early Linux framework for userspace device drivers; no DMA isolation (predecessor to VFIO)	VFIO Internals
userfaultfd	User Fault File Descriptor — Linux syscall letting userspace handle page faults (used for live migration, lazy restore)	Linux Expert Syscalls
Velox	Meta's open-source C++ vectorized execution engine library — embeds into Presto (Prestissimo), Spark (Gluten), and other engines to share one high-quality vectorized kernel	Velox Internals
VectorEncoding	Velox encoding taxonomy for BaseVector subclasses: FLAT, CONSTANT, DICTIONARY, BIASED, SEQUENCE, LAZY, ROW, MAP, ARRAY	Velox Internals
VectorLoader	Velox callback object wrapped by LazyVector; called to decode a column on first access (late materialization)	Velox Internals
VIPT	Virtually Indexed Physically Tagged — I-cache design using virtual bits for set index (fast) and physical tag for correctness (no aliasing if index bits lie within page offset)	Superscalar OoO CPU
VCL	Volume Complete LSN — Aurora's highest LSN for which all prior log records reached all storage quorum nodes	Disaggregated Storage
VDL	Volume Durable LSN — Aurora's effective recovery point: highest CPL <= VCL	Disaggregated Storage
VFIO	Virtual Function I/O — Linux kernel framework for safe userspace device drivers using IOMMU DMA isolation	VFIO Internals
VLA	Vector-Length Agnostic — programming model where code adapts to hardware vector width at runtime (ARM SVE, RISC-V RVV)	ISA Critical Instructions
VR	Viewstamped Replication — consensus protocol by Oki/Liskov using views and viewstamps, equivalent to Multi-Paxos	Distributed Consensus
VT-d	Virtualization Technology for Directed I/O — Intel's IOMMU implementation for DMA remapping and device isolation	VFIO Internals
WAL	Write-Ahead Log — durability mechanism requiring all changes to be logged before being written to data files	WAL & Torn Pages
WATT	Write-Aware Timestamp Tracking — eviction policy tracking write timestamps for better page replacement decisions	Buffer Management
WCOJ	Worst-Case Optimal Join — join algorithm (e.g., LeapfrogTrieJoin) matching the AGM bound for cyclic queries	Join Algorithms
WiredTiger	MongoDB's default B-tree storage engine using copy-on-write, MVCC, and hazard pointers for concurrency	MongoDB/WiredTiger Internals
White Rabbit	CERN sub-nanosecond time sync (now IEEE 1588-2019 High Accuracy profile): PTP + Synchronous Ethernet (SyncE) frequency lock + DDMTD phase measurement over fiber; <1 ns cross-site	Low-Latency Trading
XDP	eXpress Data Path — Linux eBPF-based programmable network processing at the NIC driver level before kernel stack	Linux Expert Syscalls
Z-set	Generalized multiset with integer weights (positive=insert, negative=delete) — core data model of DBSP/Feldera	Database Systems
Ztso	RISC-V TSO extension — provides Total Store Order semantics for x86 binary translation compatibility	ISA Critical Instructions
AMS	AMS Sketch (Alon-Matias-Szegedy 1999) — randomized sketch estimating second frequency moment F₂ = Σfᵢ²; basis for join size estimation	Database Statistics
ACORN	Approximate search framework supporting predicate-agnostic filtered ANN by expanding beam width to compensate for filtered nodes in HNSW graph	Text & Vector Search
ADC	Asymmetric Distance Computation — ANN technique precomputing query-to-codebook distances into lookup table; O(M) distance vs O(d)	Text & Vector Search
ANN	Approximate Nearest Neighbor — find vector within (1+ε) × optimal distance; trades recall for speed; graph/IVF/quantization methods	Text & Vector Search
BEIR	Benchmark for heterogeneous zero-shot IR evaluation — 18 datasets (web/bio/legal/sci); reveals generalization gap of dense models vs BM25	Text & Vector Search
BKD-tree	Disk-friendly k-d tree variant used in Lucene for numeric and geo range queries; leaf blocks of 512–1024 points	Text & Vector Search
BM25	Best Match 25 — probabilistic term-weighting ranking function (Robertson et al. 1994); de-facto standard for keyword search	Text & Vector Search
BMW	Block-Max WAND — extends WAND with per-block max scores for finer-grained postings skipping (Ding & Suel SIGIR 2011)	Text & Vector Search
CAGRA	CUDA ANNS GRAph-based — NVIDIA GPU-native graph ANN algorithm; 33–77× faster than CPU HNSW for batch search	Text & Vector Search
ColBERT	Contextualized Late Interaction over BERT — per-token embeddings + MaxSim aggregation; stronger quality than bi-encoder, more storage	Text & Vector Search
DiskANN	Microsoft disk-resident ANN system using Vamana graph; 1B vectors on 64GB RAM + NVMe; >95% recall@1 at <5ms (NeurIPS 2019)	Text & Vector Search
DPR	Dense Passage Retrieval — bi-encoder dense retrieval (Karpukhin et al. EMNLP 2020); trained with in-batch + BM25 hard negatives	Text & Vector Search
HNSW	Hierarchical Navigable Small World — multi-layer proximity graph for ANN; O(ef × log n) search; dominant algorithm on ann-benchmarks	Text & Vector Search
IVF	Inverted File Index — k-means partition ANN; scan only nprobe nearest centroid lists; base of FAISS IVFPQ	Text & Vector Search
LSH	Locality Sensitive Hashing — hash collision probability proportional to similarity; random projections for L2, SimHash for cosine	Text & Vector Search
MaxScore	Early termination algorithm splitting postings into essential/non-essential lists; rank-safe top-K (Turtle & Flood 1995)	Text & Vector Search
MIPS	Maximum Inner Product Search — variant of ANN for inner product similarity; used in recommendation and dense retrieval	Text & Vector Search
MRL	Matryoshka Representation Learning — embeddings meaningful at all prefix lengths [8..2048]; truncate at inference (Kusupati NeurIPS 2022)	Text & Vector Search
MTEB	Massive Text Embedding Benchmark — 56 tasks across 8 categories; standard leaderboard for sentence/passage embedding models	Text & Vector Search
PQ	Product Quantization — split d-dim vector into M subspaces of d/M dims each, quantize independently; M bytes per vector (Jégou 2011)	Text & Vector Search
PLAID	Performance-optimized Late Interaction Driver — centroid interaction pre-filter for ColBERT; 45× faster on CPU (CIKM 2022)	Text & Vector Search
RaBitQ	Rotation + 1-bit quantization — apply random rotation before binary quantization; tight theoretical error bound (Gao SIGMOD 2024)	Text & Vector Search
RRF	Reciprocal Rank Fusion — score = Σ 1/(k + rank_r); parameter-free fusion of multiple ranked lists (Cormack SIGIR 2009)	Text & Vector Search
ScaNN	Scalable Nearest Neighbor — Google ANN library using anisotropic quantization; 2× faster than competitors on ann-benchmarks (ICML 2020)	Text & Vector Search
SPLADE	Sparse Lexical and Expansion — BERT MLM head → 30K sparse vector with term expansion + weighting; served via inverted index (SIGIR 2021)	Text & Vector Search
WAND	Weak AND — pivot-based postings skip algorithm for top-K; rank-safe, 10–25× faster than DAAT (Broder et al. CIKM 2003)	Text & Vector Search
ACE	AXI Coherency Extensions — ARM extension adding snoop channels (AC/CR/CD) to AXI for cache-coherent masters; ACE-Lite for non-cached coherent agents (DMA, accelerators)	Interconnects
AIB	Advanced Interface Bus — Intel-originated open chiplet D2D standard (1024 wires/channel); used in EMIB-based Sapphire Rapids/Ponte Vecchio; largely subsumed by UCIe Advanced	Interconnects
AXI	Advanced eXtensible Interface — Arm AMBA bus standard; AXI4 has 5 independent channels (AW/W/B/AR/R); AXI5 adds atomics and unique-ID interleave	Interconnects
BoW	Bunch of Wires — OCP/OIF chiplet D2D parallel-wire standard targeting < 2 mm; up to 16 GT/s/wire; largely subsumed by UCIe	Interconnects
CHI	Coherent Hub Interface — Arm AMBA packet-based mesh fabric; scales to 256-core server chips (Neoverse N2/V2 CMN-700); supports snoopy + directory coherence	Interconnects
CPO	Co-Packaged Optics — placing optical engines directly on switch ASIC substrate to eliminate PCB trace loss at 1.6T+; Broadcom Tomahawk 5/6, NVIDIA Quantum-X Photonics	Interconnects
CQ	Completion Queue — RDMA structure where NIC writes a CQE per completed Work Request; polled or interrupt-driven	Interconnects
DCB	Data Center Bridging — IEEE 802.1 extensions (PFC + ETS + QCN + DCBX) enabling lossless Ethernet for RoCE/FCoE	Interconnects
DCBX	Data Center Bridging Exchange — LLDP-based protocol exchanging DCB capabilities/config between switch and endpoint	Interconnects
DCQCN	Datacenter QCN — RoCEv2 congestion control combining switch ECN marking, CNP feedback, and rate adjustment at endpoint (Zhu SIGCOMM 2015)	Interconnects
DCT	Dynamic Connected Transport — InfiniBand QP type using shared pool of QPs dynamically retargeted per peer; required for 10k+ rank scale	Interconnects
DCTCP	Datacenter TCP (Alizadeh SIGCOMM 2010) — TCP variant using ECN with fractional marking + α-smoothing for low-latency DC	Interconnects
ECMP	Equal-Cost Multi-Path — routing technique distributing flows across multiple equal-cost paths via hash of packet fields; suffers hash collision under skew	Interconnects
ETS	Enhanced Transmission Selection (IEEE 802.1Qaz) — DCB feature for proportional bandwidth allocation across 8 traffic class groups	Interconnects
FCP	Fibre Channel Protocol — SCSI-over-FC mapping; the original SAN protocol; largely replaced by NVMe-oF/FC for new deployments	Interconnects
FEC	Forward Error Correction — channel coding (RS(528,514), RS(544,514), KR4) used in 25/50/100+ GbE to recover from bit errors; mandatory above 50G PAM4	Interconnects
GFAM	Global Fabric Attached Memory — CXL 3.0+ pooled coherent memory accessible by any host in a CXL fabric; sub-µs latency at TB scale	Interconnects
GMI	Global Memory Interconnect — AMD on-package coherent interconnect linking CCDs to the IOD on EPYC; GMI3 at 36 GT/s	Interconnects
HDM-DB	Host-managed Device Memory — Device-managed coherence (CXL 3.0+) where device tracks host caches and issues back-invalidations; enables fabric-attached coherent memory pools >1 TB	Interconnects
HPCC	High Precision Congestion Control (Li SIGCOMM 2019) — in-band-telemetry-based CC for RDMA; per-hop queue + utilization embedded in packets	Interconnects
IBA	InfiniBand Architecture — IBTA's full layered spec; covers physical, link, network, transport, and management layers	Interconnects
ICI	Inter-Chip Interconnect — Google's TPU pod fabric; 3D torus with OCS reconfiguration in v4+ (Jouppi et al. ISCA 2023)	Interconnects
IDE	Integrity and Data Encryption — CXL link-layer AES-GCM encryption per FLIT; selectable per virtual channel	Interconnects
IFIS	Infinity Fabric Inter-Socket — AMD inter-socket coherent interconnect (xGMI variant); 32 GT/s at Zen 4	Interconnects
IFOP	Infinity Fabric On-Package — AMD on-package coherent link between CCD and IOD; 32-36 GT/s at Zen 4/5	Interconnects
MACsec	IEEE 802.1AE — L2 line-rate AES-128/256-GCM encryption between Ethernet hops; standard on enterprise/DC NICs	Interconnects
MR	Memory Region — RDMA registered+pinned+IOMMU-mapped buffer; has lkey (local) and rkey (remote) tokens; expensive to register (10s of ms per GB)	Interconnects
MTU	Maximum Transmission Unit — largest L2 frame supported; default 1500B Ethernet; "jumbo" 9000B common in DC; matters for PFC headroom + RoCE	Interconnects
MZM	Mach-Zehnder Modulator — silicon-photonics modulator that splits light into two arms, applies electrical phase shift on one, recombines; output amplitude = cos²(Δφ/2)	Interconnects
NCCL	NVIDIA Collective Communications Library — GPU-native AllReduce/AllGather/Broadcast library; uses NVLink/IB/RoCE; supports NVLS in-network reduction	Interconnects
NeuronLink	AWS Trainium proprietary interconnect; NeuronLink-v3 at ~12 Tbps aggregate per chip on Trainium2	Interconnects
NIXL	NVIDIA Inference Transfer Library (2024-2025) — disaggregated KV-cache transport for LLM serving; integrates Dynamo/vLLM	Interconnects
NPIV	N_Port ID Virtualization — Fibre Channel feature letting multiple virtual ports share one HBA; required for VM passthrough on FC SANs	Interconnects
NRZ	Non-Return-to-Zero — binary signaling (1 bit/symbol); used in PCIe 1-5, Ethernet up to 25 Gbaud; superseded by PAM4 above 50 Gbaud	Interconnects
NVL72	NVIDIA NVLink 72 — rack-scale architecture with 72 B200 GPUs in single coherent NVLink domain; 9 NVSwitch trays, 130 TB/s aggregate, copper backplane	Interconnects
NVLS	NVLink Sharp — in-switch reduction on NVSwitch 3.0+; halves AllReduce bandwidth requirement vs ring	Interconnects
NVMe-oF	NVMe over Fabrics — NVMe wire protocol over RDMA (RoCE/IB), TCP, or FC; replaces iSCSI/FC for SSD-class storage networking	Interconnects
OCS	Optical Circuit Switch — switch routing entirely in optical domain (MEMS mirrors or AWG); slow reconfig (ms), but very high BW/power efficiency once configured	Interconnects
ODP	On-Demand Paging — RDMA NIC feature replacing MR page-pinning with on-the-fly page faults via PCIe ATS+PRI; ~5-10 µs fault penalty	Interconnects
OFI	OpenFabrics Interfaces — libfabric API and provider framework (verbs/EFA/psm3/cxi/tcp); alternative to UCX, preferred by AWS/Cray/Intel stacks	Interconnects
OpenHBI	OCP High Bandwidth Interface — chiplet D2D spec targeting HBM-class memory interconnect; largely overlapped by HBM PHY and UCIe	Interconnects
PAM4	Pulse Amplitude Modulation 4-level — 2 bits/symbol signaling; doubles baud-rate vs NRZ at cost of lower SNR; standard for 50G+ per-lane Ethernet/PCIe 6+	Interconnects
PFC	Priority-based Flow Control (IEEE 802.1Qbb) — pause only one of 8 traffic classes per port; required for lossless Ethernet (RoCEv2, FCoE)	Interconnects
QCN	Quantized Congestion Notification (IEEE 802.1Qau) — DCB explicit-feedback CC; largely superseded by ECN-based protocols	Interconnects
QP	Queue Pair — RDMA endpoint pair (send queue + receive queue); types: RC, UC, UD, XRC, DCT	Interconnects
RNR	Receiver Not Ready — RDMA NAK indicating receiver had no posted RECV when SEND arrived; triggers sender backoff + retry	Interconnects
RoCE	RDMA over Converged Ethernet — verbs over Ethernet (v1 L2-only, dead) or UDP/IP (v2, port 4791, dominant)	Interconnects
RoCEv2	RoCE version 2 — RDMA verbs encapsulated in UDP/IP; routable; requires lossless fabric (PFC) + ECN-based CC (DCQCN); UDP port 4791	Interconnects
SerDes	Serializer/Deserializer — high-speed parallel-to-serial signaling IP; the fundamental scaling unit (per-lane signaling) of all modern interconnects	Interconnects
SHARP	Scalable Hierarchical Aggregation and Reduction Protocol — Mellanox in-switch reduction for IB; halves AllReduce bandwidth requirement	Interconnects
TDISP	TEE Device Interface Security Protocol — PCIe spec (adopted by CXL) for attesting confidential devices; required for confidential CXL/PCIe accelerator workloads	Interconnects
TileLink	Open RISC-V coherent chip protocol (UC Berkeley); three tiers TL-UL/TL-UH/TL-C; used in SiFive/BOOM/Chipyard	Interconnects
UALink	Ultra Accelerator Link — 2024 open consortium (AMD/Broadcom/Cisco/Google/Intel/Meta/MS/HPE); coherent NVLink alternative; targets 1024-GPU domains via Ethernet PHY + custom protocol	Interconnects
UEC	Ultra Ethernet Consortium — 2023-2025 Linux Foundation project; UEC 1.0 spec (Jun 2025) defines RUD/RUDI transport with packet spraying + modern CC for AI on commodity Ethernet	Interconnects
UPI	Ultra Path Interconnect — Intel inter-socket/inter-die coherent fabric (MESIF protocol); 10.4 GT/s (SKL) → 24 GT/s (GNR)	Interconnects
WR	Work Request — RDMA element posted to a QP's send or receive queue describing an I/O (opcode, sg_list, remote_addr/rkey, etc.)	Interconnects
xGMI	Inter-Socket Global Memory Interconnect — AMD coherent link between EPYC sockets (and between MI300 GPUs); 32 GT/s at gen4-5	Interconnects
ZR	Coherent optical pluggable family — 400ZR/800ZR for metro distances (80-120 km unamplified) using DP-16QAM with integrated DSP	Interconnects

GPU Programming Libraries

Term	Definition	Covered In
AMX	Apple Matrix coprocessor — undocumented CPU-side matrix unit (one per CPU cluster), reachable only via Accelerate (BLAS/vDSP), not from Metal; specs are third-party reverse-engineered	GPU Programming Libraries
ANE	Apple Neural Engine — dedicated power-efficient NPU on Apple Silicon; reachable only through Core ML, not Metal or MLX (a closed ecosystem wall)	GPU Programming Libraries
Arithmetic intensity	FLOPs per byte moved; the x-axis of the roofline model — determines whether a kernel is memory-bound (low AI) or compute-bound (high AI)	GPU Programming Libraries, GPU/TPU Accelerator Design
Coalescing	Merging a warp's 32 global-memory accesses that fall in one 128 B line into a single transaction; the largest global-memory perf lever on NVIDIA GPUs	GPU Programming Libraries
Cooperative matrix	`VK_KHR_cooperative_matrix` (2023) — Vulkan subgroup-scoped tensor-core/matrix-core GEMM primitive; the portable path to tensor cores outside CUDA	GPU Programming Libraries
cp.async	Ampere (sm_80+) async copy global→shared bypassing the register file; overlaps load with compute; pre-Hopper software-pipelining primitive	GPU Programming Libraries
CUTLASS	NVIDIA open-source C++ template GEMM/conv library and reference for programming Tensor Cores; CuTe (3.x) is its `Layout = Shape ⊗ Stride` algebra	GPU Programming Libraries
DSMEM	Distributed Shared Memory — Hopper thread-block-cluster feature letting blocks on one GPC access each other's SMEM over an SM-to-SM network (~7× faster than global round-trip, vendor-stated)	GPU Programming Libraries
fatbin	CUDA fat binary — container bundling multiple arch-specific cubins plus PTX for JIT fallback	GPU Programming Libraries
gfx target	AMD GPU arch identifier (e.g. gfx90a=MI200/CDNA2, gfx942=MI300/CDNA3); binaries are gfx-specific — the ROCm analogue of `sm_XX`	GPU Programming Libraries
HIP	Heterogeneous-compute Interface for Portability — AMD's near-1:1 CUDA-clone C++ runtime + kernel language; `hipcc` targets AMD (ROCclr/HSA) or NVIDIA (thin CUDA shim)	GPU Programming Libraries
HIPIFY	ROCm CUDA→HIP source translator; hipify-perl (regex, shallow) vs hipify-clang (AST-based, accurate)	GPU Programming Libraries
LDS	Local Data Share — AMD's on-chip per-CU scratchpad (~64 KB, 32 banks); the ROCm equivalent of CUDA shared memory / Metal threadgroup memory	GPU Programming Libraries
MFMA	Matrix Fused Multiply-Add — AMD CDNA matrix-core instructions (incl. FP64 matrix); the ROCm analogue of NVIDIA Tensor-Core `mma`, with AMD-specific fragment layouts	GPU Programming Libraries
MLX	Apple's open-source NumPy/JAX-like array framework; unified-memory native, lazy-eval (fusion at `mx.eval()`), maps to Metal kernels; mlx-lm for local LLMs	GPU Programming Libraries
MSL	Metal Shading Language — Apple's C++14-based GPU kernel language	GPU Programming Libraries
Occupancy	Active warps ÷ max warps per SM; limited by registers/shared-mem/block count; max occupancy ≠ max throughput (Volkov: ILP can hide latency at low occupancy)	GPU Programming Libraries
PTX	Parallel Thread eXecution — NVIDIA's forward-compatible virtual ISA; JIT-compiled to SASS by the driver at load (the CUDA forward-compat mechanism)	GPU Programming Libraries
ptxas	NVIDIA PTX→SASS optimizing assembler; where register allocation and scheduling happen (`-v` prints reg/smem/spill usage)	GPU Programming Libraries
RCCL	ROCm Communication Collectives Library — AMD's NCCL-compatible multi-GPU collectives	GPU Programming Libraries
Roofline	Williams et al. (CACM 2009) performance model: attainable FLOP/s = min(peak compute, arithmetic_intensity × peak bandwidth); ridge point separates memory- vs compute-bound	GPU Programming Libraries, GPU/TPU Accelerator Design
s_waitcnt	AMD ISA instruction stalling until outstanding vector-memory / LDS+scalar / export counts drain; ROCm has no automatic memory-op dependency tracking	GPU Programming Libraries
SASS	Streaming ASSembler — NVIDIA's actual per-architecture machine ISA (undocumented, changes each generation); ptxas emits it, `cuobjdump -sass` disassembles it	GPU Programming Libraries
SIMD-group	Apple Metal's lockstep lane group (32-wide on Apple Silicon); the Metal analogue of a CUDA warp / AMD wavefront	GPU Programming Libraries
simdgroup_matrix	Metal's on-GPU cooperative matrix-multiply primitive (8×8 tiles across a SIMD-group); Apple's tensor-core analogue	GPU Programming Libraries
SPIR-V	Khronos binary intermediate IR consumed by Vulkan/OpenCL/SYCL; SPIRV-Cross transpiles it to GLSL/HLSL/MSL (how MoltenVK/wgpu retarget Metal/DX)	GPU Programming Libraries
Sub-group	SYCL/Vulkan warp-equivalent lane group within a work-group; supports shuffles/reductions	GPU Programming Libraries
SYCL	Khronos single-source C++17 heterogeneous-compute standard; buffer/accessor or USM memory models; DPC++ and AdaptiveCpp implementations over CUDA/HIP/Level-Zero/OpenCL backends	GPU Programming Libraries
TBDR	Tile-Based Deferred Rendering — Apple GPU architecture splitting the frame into on-chip tiles; imageblocks and tile shaders exploit it for on-chip compute	GPU Programming Libraries
tcgen05	Blackwell (sm_100a) 5th-gen tensor-core MMA (UMMA); FP4/FP6 with block scaling, accumulators in dedicated Tensor Memory (TMEM); can span 2 SMs	GPU Programming Libraries
TMA	Tensor Memory Accelerator — Hopper (sm_90+) hardware DMA engine for bulk multidimensional tiled global↔shared copies via a `CUtensorMap` descriptor, signaling an mbarrier on completion	GPU Programming Libraries
UMA	Unified Memory Architecture — Apple Silicon's single physical LPDDR5X pool shared zero-copy by CPU/GPU/ANE (no discrete VRAM, no PCIe copies); AMD MI300A is an HBM-based UMA APU	GPU Programming Libraries
USM	Unified Shared Memory — SYCL's raw-pointer memory model (`malloc_device`/`malloc_shared`/`malloc_host`); the CUDA-like alternative to the buffer/accessor model	GPU Programming Libraries
Wavefront	AMD's lockstep lane group: 64 threads (GCN/CDNA) or 32 (RDNA wave32); the ROCm analogue of a CUDA warp — a top portability gotcha	GPU Programming Libraries
wgmma	Hopper (sm_90a) warpgroup (4-warp/128-thread) async tensor-core MMA; operand B in SMEM; reaches higher peak than legacy `mma.sync`	GPU Programming Libraries
WGSL	WebGPU Shading Language — the shader language for wgpu/WebGPU compute pipelines	GPU Programming Libraries
WMMA	Warp/Wave Matrix Multiply-Accumulate — NVIDIA's portable warp-level tensor-core fragment API, and (separately) AMD RDNA3+'s consumer matrix instruction (16×16×16, no FP64)	GPU Programming Libraries

Cache Eviction, Admission & Prefetching

Term	Definition	Detailed In
ARC	Adaptive Replacement Cache — self-tuning O(1) policy with T1 (recency) + T2 (frequency) lists and B1/B2 ghost lists; adaptive partition point p updated on ghost hits; Megiddo & Modha FAST 2003; used in ZFS, DB2	Cache Algorithms
CLOCK-Pro	Approximation of LIRS using three clock hands (hot/cold/test); scan-resistant without explicit ghost lists; lower memory overhead than ARC; Jiang et al. ATC 2005; used in NetBSD VM	Cache Algorithms
GDSF	Greedy Dual Size Frequency — priority-queue eviction using key = (freq/size) + clock-inflation L; scan-resistant, size-aware; web proxy caching; Cherkasova HPL 1998; used in Squid	Cache Algorithms
GL-Cache	Group-level Learned Cache — ML model ranks groups of objects for batch eviction rather than individual objects; amortises inference cost; Yang et al. FAST 2023	Cache Algorithms
LeCaR	Learning Cache Replacement — online RL mixture of LRU + LFU experts via multiplicative-weights update; regret-minimising; Vietri et al. HotStorage 2018	Cache Algorithms
LHD	Least Hit Density — evicts object with lowest estimated hits-per-byte-per-time-unit; sampled from random candidates; class-based histogram; Beckmann & Sanfilippo NSDI 2018	Cache Algorithms
LIRS	Low Inter-reference Recency Set — recency stack with HIR (high IRR) / LIR (low IRR) classification; promotes repeatedly-hit items; scan-resistant; Jiang & Zhang SIGMETRICS 2002; used in H2 DB, Caffeine (historical)	Cache Algorithms
LRB	Learning Relaxed Belady — GBDT model trained offline on request traces to predict next reuse time; approximates Belady's OPT; Song et al. NSDI 2020; deployed in Apache Traffic Server research	Cache Algorithms
LRU-K	LRU variant tracking K most-recent access timestamps per object; evicts object with oldest K-th reference; eliminates one-hit wonders; O'Neil et al. SIGMOD 1993	Cache Algorithms
MGLRU	Multi-Generational LRU — Linux 6.1+ page reclaim using hardware-assisted generation counters (PG_referenced + page table young bits); replaces clock-sweep for anonymous + file pages; Kuo LKML 2022	Cache Algorithms
MRC	Miss Ratio Curve — function mapping cache size → miss ratio; computed via reuse-distance analysis (exact) or SHARDS sampling (approximate); essential for cache sizing decisions	Cache Algorithms
QD-LP	Quick-Demotion Large-Protection — small FIFO filter for one-hit-wonder demotion (QD) + large LRU main region with lazy promotion (LP); Yang et al. HotOS 2023	Cache Algorithms
S3-FIFO	Small/Slow/Sliding FIFO — three FIFO queues: S (10% capacity) + M (90%) + G (ghost); frequency bit in S promotes to M on second access; simple, scan-resistant, low metadata overhead; Yang et al. SOSP 2023	Cache Algorithms
SHARDS	Spatially Hashed Approximate Reuse Distance Sampling — O(1) amortised MRC construction via consistent hashing on object keys; 1% sampling rate with <1% miss ratio error; Waldspurger et al. FAST 2015	Cache Algorithms
SIEVE	SIEVE eviction — single FIFO queue + hand pointer; visited bit cleared on first eviction pass (lazy demotion); simpler than LRU, competitive hit rate on CDN workloads; Zhang et al. NSDI 2024	Cache Algorithms
SLRU	Segmented LRU — two LRU segments: probationary (new entries) + protected (second-hit promoted); objects demoted from protected → probationary on eviction pressure; used as main cache in W-TinyLFU	Cache Algorithms
TinyLFU	Tiny LFU admission filter — 4-bit Count-Min Sketch frequency estimator + doorkeeper Bloom filter; admits new object only if frequency ≥ eviction candidate; reset-based aging; Einziger et al. 2017	Cache Algorithms
W-TinyLFU	Window-TinyLFU — 1% window LRU + 99% SLRU main cache + TinyLFU admission gate + hill-climbing window-size tuner; production standard; used in Caffeine, Cassandra, Kafka, Solr, HBase, Neo4j	Cache Algorithms
Aeron	Real Logic's reliable UDP unicast/multicast + shared-memory IPC transport; lock-free; ~18 µs latency on hardware; pairs with SBE encoding for low-latency messaging	Low-Latency Trading
Alpha decay	Degradation of a trading signal's predictive power over time; ~5–10%/yr under normal conditions; a latency disadvantage estimated to cut returns ~5.6% (US) / ~10% (EU)	Low-Latency Trading
AOC	Active Optical Cable — fiber cable with E/O/E conversion at each end; ~5 ns/m propagation + ~5–10 ns per-end conversion latency; used for across-row links in colo	Low-Latency Trading
CQI	Crumbling Quote Indicator — IEX's model over sequential away-exchange quote updates that predicts an imminent NBBO move; fires for ~2 ms; triggers D-Peg/D-Limit re-pricing by 1 MPV	Low-Latency Trading
DAC	Direct Attach Copper — passive twinaxial cable; ~5.2 ns/m, no SerDes retimer, ≤~7 m; lowest-latency inside-rack NIC-to-NIC connection option	Low-Latency Trading
eASIC	Embedded/structured ASIC — a midpoint between FPGA and full-custom silicon; hardened datapath logic with a reconfigurable region for protocol changes; near-ASIC speed + FPGA flexibility	Low-Latency Trading
FAST	FIX Adapted for Streaming — template-based binary encoding with field operators (copy/delta/increment/constant) + PMAP presence map; compresses bandwidth-heavy feeds (OPRA options); stateful decoding trades CPU/latency for wire size; superseded by SBE for latency-critical feeds	Low-Latency Trading
ISO	Intermarket Sweep Order — Reg NMS order type marked to simultaneously take out protected quotes on multiple venues; shifts trade-through compliance onto sender; enables parallel multi-venue routing	Low-Latency Trading
MSG_ZEROCOPY	Linux send-side zero-copy (`SO_ZEROCOPY` + `MSG_ZEROCOPY`); pins user pages into kernel skb, avoiding the copy; requires async completion via `MSG_ERRQUEUE`; only beneficial for writes ≥~10 KB; counter-productive for small trading messages	Low-Latency Trading
OxCaml	Jane Street's open-source OCaml branch; adds locality modes for stack allocation (`local`/`global`/`exclave`) and Rust-style data-race-free parallelism ("Oxidizing OCaml"); eliminates heap allocation and GC pressure on hot paths	Low-Latency Trading
Reg NMS	Regulation National Market System — SEC rules governing US equity market structure; Rule 611 (order protection/trade-through rule) + Rule 610 (access fees) + Rule 612 (minimum pricing increment) drive smart order routing	Low-Latency Trading
Rule 15c3-5	SEC Market Access Rule (2010) — requires broker-dealers to apply pre-trade risk controls (fat finger, position limits, rate throttle) under their direct and exclusive control; banned naked sponsored access; Knight Capital $460M loss (2012) is the enforcement anchor	Low-Latency Trading
SOR	Smart Order Router — software (or FPGA) that routes orders across multiple trading venues while satisfying Reg NMS best-execution obligations; selects venue by price, size, fill probability, toxicity, fees, and RTT	Low-Latency Trading
THOR	Tactical Hybrid Order Router (RBC) — SOR that staggers order send times so slices arrive simultaneously at all venues, defeating cross-venue latency arbitrage; US patents 9,280,791; 10,896,466; 12,154,173	Low-Latency Trading
vDSO	Virtual Dynamic Shared Object — kernel-mapped user-space library accelerating `clock_gettime`/`gettimeofday`/`time`/`getcpu` without a syscall; reads TSC via vvar page; ~10–30 ns vs ~100+ ns syscall; falls back to real syscall if TSC marked unreliable (VM migration, hotplug)	Low-Latency Trading
VPIN	Volume-Synchronized Probability of Informed Trading — rolling average absolute order imbalance per equal-volume bucket; measures flow toxicity; Easley/López de Prado/O'Hara RFS 2012; predictive power contested by Andersen-Bondarenko 2014	Low-Latency Trading