Home X Github About

Glossary

Glossary

Acronyms and technical terms used across research docs.

TermDefinitionCovered In
BTBBranch Target Buffer — hardware cache mapping instruction PCs to predicted branch targets; hierarchical (L0 µBTB / L1 / L2)Superscalar OoO CPU
ACSAccess Control Services — PCIe capability ensuring peer-to-peer isolation for IOMMU groupsVFIO Internals
AES-NIAdvanced Encryption Standard New Instructions — x86 hardware-accelerated AES encryptionISA Critical Instructions
AMQPAdvanced Message Queuing Protocol — wire-level messaging protocol implemented by RabbitMQRabbitMQ Internals
AQEAdaptive Query Execution — Spark 3.0+ runtime re-optimization based on actual data statisticsDatabase Systems
ARIESAlgorithm for Recovery and Isolation Exploiting Semantics — foundational WAL recovery protocol using LSN-based redo/undoWAL & Torn Pages
ARTAdaptive Radix Tree — cache-friendly trie with variable node sizes (Node4/16/48/256), developed at TUMDatabase Systems
AVX-512Advanced Vector Extensions 512-bit — x86 SIMD instruction set processing 16 floats per instructionISA Critical Instructions
BARBase Address Register — PCI configuration space register defining device memory-mapped I/O regionsVFIO Internals
BFTByzantine Fault Tolerance — ability to reach consensus despite arbitrary (malicious) node failuresDistributed Consensus
BMIBit Manipulation Instructions — x86 extension for PEXT/PDEP/BLSR and other bit-level operationsISA Critical Instructions
BNLJBlock Nested Loop Join — join variant reading outer relation in memory-sized blocks to reduce I/OJoin Algorithms
BRINBlock Range Index — PostgreSQL lightweight index storing min/max per physical block rangeDatabase Systems
BTIBranch Target Identification — ARM control-flow integrity mechanism marking valid indirect branch targetsISA Critical Instructions
Bw-TreeLock-free B+ tree using delta records and CAS operations, developed for Hekaton/Azure Cosmos DBData Structures
CAPConsistency, Availability, Partition tolerance — Brewer/Gilbert-Lynch theorem on distributed system trade-offsDistributed Consensus
CASCompare-And-Swap — atomic instruction for lock-free programming; CMPXCHG on x86, CAS/LDXR-STXR on ARMISA Critical Instructions
CBOCost-Based Optimization — query optimization using table/column statistics for plan selectionDatabase Systems
CMSCount-Min Sketch (Cormode & Muthukrishnan 2005) — d×w counter matrix for streaming frequency estimation with O(ε⁻¹ log δ⁻¹) spaceDatabase Statistics
CDCChange Data Capture — technique for streaming database changes as events, often via Debezium or logical replicationWAL Incremental Conversion
CDNACompute DNA — AMD's compute-focused GPU microarchitecture (MI-series accelerators)GPU/TPU Accelerator Design
CETControl-flow Enforcement Technology — Intel's shadow stack + ENDBR indirect branch tracking for CFIISA Critical Instructions
CLOGCommit Log — PostgreSQL structure tracking transaction commit/abort status for MVCC visibilityArrow PostgreSQL Integration
COWCopy-On-Write — technique where data is shared until modified; used in WiredTiger B-trees, Neon branching, btrfsMongoDB/WiredTiger Internals
CoWoSChip-on-Wafer-on-Substrate — TSMC advanced packaging for multi-die integration (used in H100/A100)GPU/TPU Accelerator Design
CPLConsistency Point LSN — Aurora's highest LSN representing a transaction-consistent boundaryDisaggregated Storage
CQECompletion Queue Entry — io_uring kernel-to-user result structure for completed I/O operationsio_uring Internals
CRDTConflict-free Replicated Data Type — data structure achieving eventual consistency without coordinationDistributed Consensus
CalciteApache Calcite — embeddable Java SQL parser + validator + relational-algebra optimizer (no storage/execution engine); powers Flink, Hive, Druid, Kylin, Beam, Phoenix; Begoli SIGMOD 2018Calcite Internals
CascadesTop-down memoizing query-optimization framework (Graefe 1995) using rule-driven exploration of a memo of equivalence groups with branch-and-bound; basis of Calcite's VolcanoPlanner and CockroachDB's optimizerCalcite Internals, CockroachDB Optimizer Rules
ConventionCalcite trait identifying the execution engine / calling convention of a RelNode (NONE = logical, ENUMERABLE = built-in linq4j, BINDABLE = interpreted, JDBC/Druid/etc.); inputs must match or be bridged by a converterCalcite Internals
RelNodeCalcite relational expression (Project/Filter/Join/Aggregate/Scan…) carrying a RelTraitSet; logical forms in rel.logical, physical forms per convention; memoized by digest in the VolcanoPlannerCalcite Internals
RexNodeCalcite row/scalar expression (RexInputRef $n by ordinal, RexLiteral, RexCall, RexOver window, RexSubQuery, RexCorrelVariable); built by RexBuilder, simplified by RexSimplifyCalcite Internals
RelSubsetA RelSet (equivalence class of semantically identical RelNodes) restricted to one RelTraitSet; caches the cheapest member (best/bestCost); the unit of Calcite's Volcano memo and cost propagationCalcite Internals
RexUnknownAsCalcite three-valued-logic mode (FALSE/TRUE/UNKNOWN) telling RexSimplify how NULL may be folded in a given syntactic position (e.g. UNKNOWN→FALSE under WHERE)Calcite Internals
CXLCompute Express Link — cache-coherent interconnect for CPU-to-device memory sharing over PCIe physical layerDisaggregated Storage
DBSPDatabase Stream Processor — formal mathematical framework for incremental computation over Z-sets (Feldera)Database Systems
DMADirect Memory Access — hardware capability for devices to read/write main memory without CPU involvementVFIO Internals
DMBData Memory Barrier — ARM instruction ordering memory accesses without stalling executionISA Critical Instructions
DecodedVectorVelox helper that normalizes any vector encoding (DICT/CONST/BIAS/SEQ) to flat base + indices for safe element accessVelox Internals
DPccpDynamic Programming connected complement pairs — join enumeration algorithm for bushy plansJoin Algorithms
DPhypDynamic Programming on hypergraphs — join enumeration supporting multi-relation predicatesJoin Algorithms, DuckDB Internals
DPDKData Plane Development Kit — userspace networking framework using VFIO for kernel-bypass packet processing (~30-40 Mpps)VFIO Internals
DSBData Synchronization Barrier — ARM instruction that stalls execution until all prior memory accesses completeISA Critical Instructions
DSTDeterministic Simulation Testing — technique running distributed systems in a single-threaded deterministic simulatorDeterministic Simulation Testing
EBREpoch-Based Reclamation — memory reclamation scheme for lock-free data structures using global epoch trackingData Structures
EPTExtended Page Tables — Intel VT-x hardware-assisted two-level address translation for VM memoryISA Critical Instructions
EVEXExtended VEX — x86 instruction prefix encoding for AVX-512/APX supporting 32 vector registers and maskingISA Critical Instructions
FDWForeign Data Wrapper — PostgreSQL mechanism for querying external data sources as local tablesDatabase Systems
FLPFischer-Lynch-Paterson impossibility — proof that deterministic asynchronous consensus is impossible with even one crashDistributed Consensus
FMAFused Multiply-Add — single instruction computing a*b+c with one rounding; used in Tensor Cores and CPU SIMDGPU/TPU Accelerator Design
FORFrame-of-Reference — lightweight compression storing a per-segment base (min) and bitpacked offsets; common for dates/timestampsDuckDB Internals
FSSTFast Static Symbol Table — string compression assigning 1-byte codes to frequent substrings, allowing random access without full decompressionDuckDB Internals
FP88-bit floating point — low-precision format (E4M3/E5M2) for LLM training/inference on Hopper+ GPUsGPU/TPU Accelerator Design
FPWFull Page Writes — PostgreSQL technique writing complete page images to WAL after checkpoint to prevent torn pagesWAL & Torn Pages
FTLFlash Translation Layer — SSD firmware mapping logical block addresses to physical NAND pagesWAL & Torn Pages
GAAGate-All-Around — transistor architecture (replacing FinFET at 2nm) where gate wraps channel on all sidesGPU/TPU Accelerator Design
GDSIIGraphic Data System II — standard file format for IC layout data sent to semiconductor foundries for fabricationGPU/TPU Accelerator Design
GEQOGenetic Query Optimizer — PostgreSQL's join ordering strategy using genetic algorithms for queries with >= 12 tablesJoin Algorithms
GINGeneralized Inverted Index — PostgreSQL index for composite values (arrays, JSONB, full-text tsvector)Database Systems
GiSTGeneralized Search Tree — PostgreSQL extensible indexing framework for complex data types (geometric, range)Database Systems
GOOGreedy Operator Ordering — O(n^3) heuristic join ordering algorithm used as fallback for large queriesJoin Algorithms
GSTGlobal Stabilization Time — the (unknown) point after which network timing bounds hold in partial synchrony modelsDistributed Consensus
GTIDGlobal Transaction Identifier — MySQL identifier simplifying replication topology management and failoverDatabase Systems
HAMTHash Array Mapped Trie — persistent data structure with near-O(1) operations via structural sharing (Clojure, Scala)Data Structures
HashStringAllocatorVelox arena allocator used inside hash tables and aggregation: 4-byte Header per block (kFree/kContinued/kPreviousFree flags) + CompactDoubleList free list; streaming write via newWrite/finishWriteVelox Internals
HBMHigh Bandwidth Memory — stacked DRAM (HBM2/HBM3) providing >1 TB/s bandwidth for GPU/accelerator designsGPU/TPU Accelerator Design
HLCHybrid Logical Clock — clock combining wall-clock time with a logical counter for causal ordering (CockroachDB)Database Systems
HLLHyperLogLog (Flajolet et al. 2007) — probabilistic NDV sketch: 2^b registers, harmonic mean estimate, 1.04/√(2^b) relative error; merges via elementwise maxDatabase Statistics, Data Structures
HOTHeap-Only Tuple — PostgreSQL optimization where updated tuples stay on the same page, avoiding index updatesArrow PostgreSQL Integration
HTAPHybrid Transactional/Analytical Processing — system handling both OLTP and OLAP workloads (HyPer, TiDB)HyPer/Umbra/CedarDB
IOMMUI/O Memory Management Unit — hardware translating device DMA addresses (IOVA) to physical addressesVFIO Internals
IOMMUFDIOMMU File Descriptor — newer Linux interface replacing VFIO container/group model with fd-centric APIVFIO Internals
IOTLBI/O Translation Lookaside Buffer — IOMMU's cache for IOVA-to-PA translations; hugepages reduce miss rate dramaticallyVFIO Internals
IOVAI/O Virtual Address — the address space a device sees through the IOMMU, analogous to virtual addresses for CPUsVFIO Internals
ILPInstruction-Level Parallelism — overlap of independent instructions from a single thread exploited by OoO executionSuperscalar OoO CPU
IPCInstructions Per Cycle — microarchitectural efficiency metric; PrediCache achieves 0.55 IPC vs 0.31 for traditionalBuffer Management
IQIssue Queue / Reservation Stations — buffer holding dispatched µops waiting for operands before execution; unified or distributedSuperscalar OoO CPU
ISRIn-Sync Replicas — Kafka replicas caught up with the partition leader, eligible for leader electionKafka Internals
JaninoLightweight in-process Java source-to-bytecode compiler; Calcite uses it to compile generated linq4j Enumerable code, RexExecutor constant folding, and the metadata-handler dispatcherCalcite Internals
JoinBridgeVelox synchronization primitive between HashBuild and HashProbe pipelines: build sets a folly::Promise with the completed HashTable; probe returns a folly::SemiFuture from isBlocked() until build completesVelox Internals
JITJust-In-Time compilation — compiling code at runtime; used by HyPer (LLVM), Umbra (asmJIT), PostgreSQL 11+HyPer/Umbra/CedarDB
JOBJoin Order Benchmark (Leis et al. VLDB 2015) — 113 IMDB queries, 3–16-way joins; standard benchmark for cardinality estimation accuracy (Q-error)Database Statistics
KPTIKernel Page Table Isolation — Meltdown mitigation separating user/kernel page tables; adds ~5-30% overhead on syscall-heavy workloadsSuperscalar OoO CPU
KLLKLL Sketch (Karnin-Lang-Liberty 2016) — near-optimal mergeable quantile sketch; O(ε⁻¹ log log 1/δ) space; better than GK sketch for distributed mergeDatabase Statistics, Data Structures
KMVK-Minimum Values sketch — maintains k smallest hash values; NDV ≈ (k-1)/max(kth_smallest); merges by union + take k smallestDatabase Statistics
KRaftKafka Raft — Kafka's built-in Raft-based consensus replacing ZooKeeper for metadata managementKafka Internals
kTLSKernel TLS — Linux kernel offload of TLS encryption/decryption for socket I/O, reducing context switchesLinux Expert Syscalls
LDARLoad-Acquire Register — ARM instruction providing acquire semantics (no subsequent access reordered before it)ISA Critical Instructions
LDAPRLoad-Acquire RCpc Register — ARM weaker acquire load (ARMv8.3-RCPC) matching C++ memory_order_consume-like behaviorISA Critical Instructions
LIPAHLogical-ID Pointer Augmented Hinting — buffer manager using fat pointers (PID + hint address); limited to 32-bit PIDsBuffer Management
LL/SCLoad-Linked/Store-Conditional — ARM/RISC-V atomic primitive pair (LDXR/STXR, LR/SC) for lock-free operationsISA Critical Instructions
LMULLength Multiplier — RISC-V Vector extension register grouping factor controlling effective vector lengthISA Critical Instructions
LQLoad Queue — per-core buffer tracking all in-flight loads for STLF lookup and memory ordering violation detectionSuperscalar OoO CPU
LPSLog Processing Service — AlloyDB component that receives WAL and materializes data blocks asynchronouslyDisaggregated Storage
LSELarge System Extensions — ARMv8.1 atomic instructions (CAS, LDADD, SWP) replacing LL/SC for better scalabilityISA Critical Instructions
LatticeCalcite OLAP construct modeling a star/snowflake schema as a virtual fact-table join; dimensions and measures define candidate aggregate "tiles" auto-selected via the HRU (Harinarayan SIGMOD 1996) cube-lattice greedy algorithm for materialized-view accelerationCalcite Internals
linq4jCalcite's Java port of .NET LINQ — Enumerable/Enumerator data model plus an expression-tree AST (org.apache.calcite.linq4j.tree) that the Enumerable convention emits and Janino compilesCalcite Internals
LSMLog-Structured Merge tree — write-optimized structure converting random writes to sequential via leveled compactionLSM Trees
LSNLog Sequence Number — monotonically increasing identifier for WAL records, used for recovery and page versioningWAL & Torn Pages
MergeTreeClickHouse's core storage engine family: each INSERT writes an immutable PK-sorted part; background merges fold parts together (LSM-like)ClickHouse Internals
GranuleClickHouse unit of index addressing — default 8192 rows (capped by index_granularity_bytes); the smallest data block the sparse index can selectClickHouse Internals
MarkClickHouse mark (.mrk3/.cmrk3) — 24-byte record mapping a granule to (offset_in_compressed_file, offset_in_decompressed_block, rows_in_granule)ClickHouse Internals
Sparse primary indexIndex storing the PK tuple only at each granule boundary (primary.idx) — lossy zone-map-style pruning, not per-rowClickHouse Internals
DoubleDeltaCodec storing second-order differences (delta-of-deltas) with Gorilla varint framing; near-free for fixed-stride sequences like timestampsClickHouse Internals
T64ClickHouse codec transposing 64 integers into bit-planes after range subtraction, storing only the needed planes; for low-range/low-cardinality intsClickHouse Internals
GorillaXOR-based float compression encoding leading/trailing zero runs of consecutive-value XORs (Pelkonen VLDB 2015); a ClickHouse codecClickHouse Internals
VolnitskyBigram-hash substring search algorithm (Boyer-Moore-Horspool variant) used in ClickHouse string/LIKE matchingClickHouse Internals
NuRaftC++ Raft consensus library underpinning ClickHouse Keeper, the ZooKeeper-compatible coordination serviceClickHouse Internals
ProjectionClickHouse alternate physical layout stored inside each part (different sort order and/or pre-aggregation), auto-maintained through mergesClickHouse Internals
MESIFModified/Exclusive/Shared/Invalid/Forward — Intel's extension of MESI with a Forward state for peer-to-peer cache supplySuperscalar OoO CPU
MESIModified/Exclusive/Shared/Invalid — CPU cache coherence protocol tracking cache line states across coresISA Critical Instructions
MCVMost Common Values — per-column list of (value, frequency) pairs stored in pg_statistic stakind=1; used for exact selectivity on high-frequency valuesDatabase Statistics
NDVNumber of Distinct Values — column statistic driving join selectivity (1/max(NDV_R, NDV_S)); estimated via HLL or Haas-Stokes samplerDatabase Statistics
MLPMemory-Level Parallelism — number of simultaneous outstanding cache misses a core can sustain; bounded by ROB size and MSHR countSuperscalar OoO CPU
MOESIModified/Owned/Exclusive/Shared/Invalid — AMD's extension of MESI with Owned state for dirty-line sharing without writebackSuperscalar OoO CPU
MPKIMisses Per Kilo-Instructions — branch or cache miss rate metric; TAGE achieves <3% branch MPKI on SPEC CPU 2006Superscalar OoO CPU
MSHRMiss Status Holding Register — tracks outstanding cache misses and coalesces accesses to the same line; count ≈ MLPSuperscalar OoO CPU
MMAMatrix Multiply-Accumulate — Tensor Core operation computing D = A * B + C on small matrix tilesGPU/TPU Accelerator Design
MMIOMemory-Mapped I/O — mapping device registers into CPU address space for direct read/write accessVFIO Internals
MorselA small chunk of a source operator's input handed to a worker thread; unit of morsel-driven parallelism and work stealingDuckDB Internals
MPSMMassively Parallel Sort-Merge — NUMA-aware join algorithm with local sort + parallel merge across nodesJoin Algorithms
MSI-XMessage Signaled Interrupts Extended — PCIe interrupt delivery via memory writes, supporting per-queue interrupt vectorsVFIO Internals
MTEMemory Tagging Extension — ARM hardware feature for detecting memory safety bugs (use-after-free, buffer overflow)Linux Expert Syscalls
MVCCMulti-Version Concurrency Control — concurrency scheme where readers see snapshots and writers create new versionsDatabase Systems, DuckDB Internals
NoCNetwork-on-Chip — on-die interconnect (ring/mesh/torus) routing traffic between cores, caches, and memory controllersSuperscalar OoO CPU, GPU/TPU Accelerator Design
NLJNested Loop Join — simplest join algorithm scanning inner relation for each outer tuple; O(|R| * B(S)) I/OJoin Algorithms
NUMANon-Uniform Memory Access — multi-socket architecture where memory access latency depends on which socket owns the memoryHyPer/Umbra/CedarDB
NVICNested Vectored Interrupt Controller — ARM Cortex-M interrupt controller with priority-based preemptionTimer Interrupts STM32
NVLinkNVIDIA proprietary high-bandwidth GPU-to-GPU interconnect (NVLink5: 1.8 TB/s bidirectional)GPU/TPU Accelerator Design
OCCOptimistic Concurrency Control — transaction scheme allowing concurrent execution, validating at commit timeDisaggregated Storage
OIDObject Identifier — PostgreSQL's internal numeric identifier for database objects (types, relations, functions)Arrow PostgreSQL Integration
OLAPOnline Analytical Processing — workload pattern of complex read-heavy aggregation queries (DuckDB, ClickHouse)Database Systems
OLTPOnline Transaction Processing — workload pattern of high-throughput short read-write transactions (PostgreSQL, MySQL)Database Systems
OoOOut-of-Order execution — CPU technique issuing instructions in data-dependency order rather than program order to hide latencySuperscalar OoO CPU
PACPointer Authentication Code — ARM cryptographic signature embedded in pointer unused bits for control-flow integrityISA Critical Instructions
PACELCPartition-Availability-Consistency / Else Latency-Consistency — extension of CAP capturing normal-operation trade-offsDistributed Consensus
PASIDProcess Address Space ID — IOMMU feature enabling per-process DMA address translation for shared virtual addressingVFIO Internals
PAXPartition Attributes Across — hybrid row/column page layout storing columns within each page (Umbra)Database Systems
PBFTPractical Byzantine Fault Tolerance — first practical BFT protocol tolerating f Byzantine faults with 3f+1 replicasDistributed Consensus
PEBSPrecise Event-Based Sampling — Intel hardware profiling capturing exact instruction pointer on performance counter overflowISA Critical Instructions
PRFPhysical Register File — centralized storage for all in-flight register values; separate INT and FP files sized at ROB + arch_regsSuperscalar OoO CPU
PGProtection Group — Aurora's 10 GB storage segment replicated 6 ways across 3 AZsDisaggregated Storage
PIDPage ID — logical identifier for a database page, translated to a buffer frame address by the buffer managerBuffer Management
Pipeline BreakerOperator that must fully consume its input before producing output (hash-join build, aggregate, sort); materializes into pipeline-local state and acts as the source of a downstream pipelineDuckDB Internals
PITRPoint-In-Time Recovery — restoring a database to any past moment by replaying WAL to a target LSN/timestampDatabase Systems
PLLPhase-Locked Loop — clock generation circuit multiplying a reference crystal frequency for the system clockTimer Interrupts STM32
PMUPerformance Monitoring Unit — hardware counters (cycles, cache misses, branch mispredictions) for CPU profilingCycle Counters & Energy
RBPEXResilient Buffer Pool Extension — local SSD cache in Azure SQL Hyperscale surviving process restartsDisaggregated Storage
RowContainerVelox row-major slab storing group keys and accumulators: fields ordered as (normalized key, null bits, fixed 8-byte slots, variable-width section, accumulators, probed flag)Velox Internals
Q-errorCardinality estimation accuracy metric: max(est/actual, actual/est) ≥ 1; Q-error=1 is perfect; JOB benchmark shows PostgreSQL p95 ≈ 12×Database Statistics
RASReturn Address Stack — hardware stack that speculatively captures call targets to predict return addressesSuperscalar OoO CPU
RATRegister Alias Table — maps architectural register names to physical register IDs during OoO rename stageSuperscalar OoO CPU
RCURead-Copy-Update — Linux kernel synchronization allowing lock-free reads with deferred reclamation of old dataData Structures
RDMARemote Direct Memory Access — network hardware reading/writing remote memory without CPU involvement (~10 us latency)Disaggregated Storage
RDTSCRead Time-Stamp Counter — x86 instruction reading the 64-bit cycle counter; RDTSCP variant serializes prior instructionsCycle Counters & Energy
ROBReorder Buffer — circular buffer holding all in-flight µops; enables in-order retirement and precise exception handlingSuperscalar OoO CPU
RLERun-Length Encoding — compression encoding consecutive identical values as (value, count) pairsDatabase Systems
RMIRecursive Model Index — learned index structure using a hierarchy of ML models to predict key positionsLSM Trees
RTLRegister Transfer Level — hardware description abstraction (Verilog/VHDL) defining logic in terms of registers and operationsGPU/TPU Accelerator Design
RUMRead, Update, Memory conjecture — states you can optimize at most two of read/write/space overhead in an indexLSM Trees
RVWMORISC-V Weak Memory Ordering — RISC-V's relaxed memory model preserving only data dependencies and same-address orderingISA Critical Instructions
RVVRISC-V Vector extension — scalable vector ISA with LMUL register grouping and vector-length agnostic (VLA) programmingISA Critical Instructions
SQStore Queue — buffer holding committed stores until they drain to the L1D cache; used for STLF and memory orderingSuperscalar OoO CPU
SCLSegment Complete LSN — per-Protection-Group completeness tracker in Aurora's storage layerDisaggregated Storage
seccompSecure Computing Mode — Linux syscall filtering mechanism using BPF programs for sandboxing (used in Neon WAL redo)Linux Expert Syscalls
Selection VectorArray of indices into a vector selecting surviving rows after a filter; threaded downstream so filtered data is not compacted until materialization (DuckDB vectorized engine)DuckDB Internals
SelectivityVectorVelox bitmask (uint64_t words, 64 rows/word) of active rows passed between operators and into expression eval; applyToSelected() iterates via __builtin_ctzllVelox Internals
SharedArbitratorVelox MemoryArbitrator implementation for global fair memory sharing across queries; 3-pass reclaim: free capacity → spill largest → abort victimVelox Internals
StringViewVelox 16-byte string representation: [size:4][inline:12] for ≤12 chars, or [size:4][prefix:4][ptr:8] for longer; enables fail-fast comparison and zero-copy substrVelox Internals
SFUSpecial Function Unit — GPU hardware computing transcendentals (sin, cos, rsqrt, log) at reduced throughputGPU/TPU Accelerator Design
SIMTSingle Instruction, Multiple Thread — GPU execution model where warps of 32 threads execute in lockstepGPU/TPU Accelerator Design
SMStreaming Multiprocessor — fundamental GPU compute unit containing CUDA cores, Tensor Cores, register file, and shared memoryGPU/TPU Accelerator Design
SMEScalable Matrix Extension — ARM extension for matrix operations using a 2D tile register (ZA) for GEMM accelerationISA Critical Instructions
SMJSort-Merge Join — join algorithm sorting both relations then merging; optimal when inputs are pre-sortedJoin Algorithms
SMMUSystem Memory Management Unit — ARM's IOMMU implementation (SMMUv3) for DMA address translation and device isolationVFIO Internals
STLFStore-to-Load Forwarding — hardware mechanism supplying load data directly from the store queue, bypassing cache (~4-5 cycles)Superscalar OoO CPU
SPDKStorage Performance Development Kit — userspace NVMe driver framework using VFIO for millions of IOPS per coreVFIO Internals
SPSCSingle Producer Single Consumer — lock-free queue variant with one writer and one reader threadData Structures
SQESubmission Queue Entry — io_uring user-to-kernel I/O request structure (opcode, fd, buffer, offset)io_uring Internals
SQPOLLSubmission Queue Polling — io_uring mode where a kernel thread polls the SQ, eliminating syscalls entirelyio_uring Internals
SR-IOVSingle Root I/O Virtualization — PCIe spec creating lightweight virtual functions from one physical deviceVFIO Internals
SSISerializable Snapshot Isolation — PostgreSQL's true serializable isolation via predicate locking and conflict detectionDatabase Systems
SSTableSorted String Table — immutable, sorted on-disk file in LSM trees containing key-value pairs with index/bloom filterLSM Trees
STLRStore-Release Register — ARM instruction providing release semantics (no preceding access reordered after it)ISA Critical Instructions
SVAShared Virtual Addressing — IOMMU feature letting devices use the same virtual addresses as the CPU processVFIO Internals
SVEScalable Vector Extension — ARM vector ISA with hardware-defined vector length (128-2048 bits) for portable SIMDISA Critical Instructions
TAGETagged Geometric History Length Branch Predictor — state-of-the-art predictor using multiple tagged components indexed by geometric history lengths (Seznec 2006)Superscalar OoO CPU
THPTransparent Huge Pages — Linux kernel feature automatically promoting 4KB page allocations to 2MB pages to reduce TLB pressureSuperscalar OoO CPU
TF32TensorFloat-32 — NVIDIA 19-bit format (8-bit exponent, 10-bit mantissa) for Tensor Core GEMM on Ampere+GPU/TPU Accelerator Design
TLBTranslation Lookaside Buffer — CPU/IOMMU cache for virtual-to-physical address translationsBuffer Management
TOASTThe Oversized-Attribute Storage Technique — PostgreSQL mechanism compressing/storing large field values out-of-lineDatabase Systems
TrueTimeGoogle's globally-synchronized clock API returning bounded time intervals using GPS + atomic clocks (Spanner)Disaggregated Storage
TSCTime Stamp Counter — x86 hardware counter incrementing at a fixed reference frequency, read via RDTSC/RDTSCPCycle Counters & Energy
TSOTotal Store Order — x86 memory model where only Store-Load reordering is permitted; most lock-free code "just works"ISA Critical Instructions
TSXTransactional Synchronization Extensions — Intel hardware transactional memory (XBEGIN/XEND), deprecated due to security issuesISA Critical Instructions
UCIeUniversal Chiplet Interconnect Express — open standard for die-to-die communication in chiplet-based designsGPU/TPU Accelerator Design
UIOUserspace I/O — early Linux framework for userspace device drivers; no DMA isolation (predecessor to VFIO)VFIO Internals
userfaultfdUser Fault File Descriptor — Linux syscall letting userspace handle page faults (used for live migration, lazy restore)Linux Expert Syscalls
VeloxMeta's open-source C++ vectorized execution engine library — embeds into Presto (Prestissimo), Spark (Gluten), and other engines to share one high-quality vectorized kernelVelox Internals
VectorEncodingVelox encoding taxonomy for BaseVector subclasses: FLAT, CONSTANT, DICTIONARY, BIASED, SEQUENCE, LAZY, ROW, MAP, ARRAYVelox Internals
VectorLoaderVelox callback object wrapped by LazyVector; called to decode a column on first access (late materialization)Velox Internals
VIPTVirtually Indexed Physically Tagged — I-cache design using virtual bits for set index (fast) and physical tag for correctness (no aliasing if index bits lie within page offset)Superscalar OoO CPU
VCLVolume Complete LSN — Aurora's highest LSN for which all prior log records reached all storage quorum nodesDisaggregated Storage
VDLVolume Durable LSN — Aurora's effective recovery point: highest CPL <= VCLDisaggregated Storage
VFIOVirtual Function I/O — Linux kernel framework for safe userspace device drivers using IOMMU DMA isolationVFIO Internals
VLAVector-Length Agnostic — programming model where code adapts to hardware vector width at runtime (ARM SVE, RISC-V RVV)ISA Critical Instructions
VRViewstamped Replication — consensus protocol by Oki/Liskov using views and viewstamps, equivalent to Multi-PaxosDistributed Consensus
VT-dVirtualization Technology for Directed I/O — Intel's IOMMU implementation for DMA remapping and device isolationVFIO Internals
WALWrite-Ahead Log — durability mechanism requiring all changes to be logged before being written to data filesWAL & Torn Pages
WATTWrite-Aware Timestamp Tracking — eviction policy tracking write timestamps for better page replacement decisionsBuffer Management
WCOJWorst-Case Optimal Join — join algorithm (e.g., LeapfrogTrieJoin) matching the AGM bound for cyclic queriesJoin Algorithms
WiredTigerMongoDB's default B-tree storage engine using copy-on-write, MVCC, and hazard pointers for concurrencyMongoDB/WiredTiger Internals
XDPeXpress Data Path — Linux eBPF-based programmable network processing at the NIC driver level before kernel stackLinux Expert Syscalls
Z-setGeneralized multiset with integer weights (positive=insert, negative=delete) — core data model of DBSP/FelderaDatabase Systems
ZtsoRISC-V TSO extension — provides Total Store Order semantics for x86 binary translation compatibilityISA Critical Instructions
AMSAMS Sketch (Alon-Matias-Szegedy 1999) — randomized sketch estimating second frequency moment F₂ = Σfᵢ²; basis for join size estimationDatabase Statistics
ACORNApproximate search framework supporting predicate-agnostic filtered ANN by expanding beam width to compensate for filtered nodes in HNSW graphText & Vector Search
ADCAsymmetric Distance Computation — ANN technique precomputing query-to-codebook distances into lookup table; O(M) distance vs O(d)Text & Vector Search
ANNApproximate Nearest Neighbor — find vector within (1+ε) × optimal distance; trades recall for speed; graph/IVF/quantization methodsText & Vector Search
BEIRBenchmark for heterogeneous zero-shot IR evaluation — 18 datasets (web/bio/legal/sci); reveals generalization gap of dense models vs BM25Text & Vector Search
BKD-treeDisk-friendly k-d tree variant used in Lucene for numeric and geo range queries; leaf blocks of 512–1024 pointsText & Vector Search
BM25Best Match 25 — probabilistic term-weighting ranking function (Robertson et al. 1994); de-facto standard for keyword searchText & Vector Search
BMWBlock-Max WAND — extends WAND with per-block max scores for finer-grained postings skipping (Ding & Suel SIGIR 2011)Text & Vector Search
CAGRACUDA ANNS GRAph-based — NVIDIA GPU-native graph ANN algorithm; 33–77× faster than CPU HNSW for batch searchText & Vector Search
ColBERTContextualized Late Interaction over BERT — per-token embeddings + MaxSim aggregation; stronger quality than bi-encoder, more storageText & Vector Search
DiskANNMicrosoft disk-resident ANN system using Vamana graph; 1B vectors on 64GB RAM + NVMe; >95% recall@1 at <5ms (NeurIPS 2019)Text & Vector Search
DPRDense Passage Retrieval — bi-encoder dense retrieval (Karpukhin et al. EMNLP 2020); trained with in-batch + BM25 hard negativesText & Vector Search
HNSWHierarchical Navigable Small World — multi-layer proximity graph for ANN; O(ef × log n) search; dominant algorithm on ann-benchmarksText & Vector Search
IVFInverted File Index — k-means partition ANN; scan only nprobe nearest centroid lists; base of FAISS IVFPQText & Vector Search
LSHLocality Sensitive Hashing — hash collision probability proportional to similarity; random projections for L2, SimHash for cosineText & Vector Search
MaxScoreEarly termination algorithm splitting postings into essential/non-essential lists; rank-safe top-K (Turtle & Flood 1995)Text & Vector Search
MIPSMaximum Inner Product Search — variant of ANN for inner product similarity; used in recommendation and dense retrievalText & Vector Search
MRLMatryoshka Representation Learning — embeddings meaningful at all prefix lengths [8..2048]; truncate at inference (Kusupati NeurIPS 2022)Text & Vector Search
MTEBMassive Text Embedding Benchmark — 56 tasks across 8 categories; standard leaderboard for sentence/passage embedding modelsText & Vector Search
PQProduct Quantization — split d-dim vector into M subspaces of d/M dims each, quantize independently; M bytes per vector (Jégou 2011)Text & Vector Search
PLAIDPerformance-optimized Late Interaction Driver — centroid interaction pre-filter for ColBERT; 45× faster on CPU (CIKM 2022)Text & Vector Search
RaBitQRotation + 1-bit quantization — apply random rotation before binary quantization; tight theoretical error bound (Gao SIGMOD 2024)Text & Vector Search
RRFReciprocal Rank Fusion — score = Σ 1/(k + rank_r); parameter-free fusion of multiple ranked lists (Cormack SIGIR 2009)Text & Vector Search
ScaNNScalable Nearest Neighbor — Google ANN library using anisotropic quantization; 2× faster than competitors on ann-benchmarks (ICML 2020)Text & Vector Search
SPLADESparse Lexical and Expansion — BERT MLM head → 30K sparse vector with term expansion + weighting; served via inverted index (SIGIR 2021)Text & Vector Search
WANDWeak AND — pivot-based postings skip algorithm for top-K; rank-safe, 10–25× faster than DAAT (Broder et al. CIKM 2003)Text & Vector Search
ACEAXI Coherency Extensions — ARM extension adding snoop channels (AC/CR/CD) to AXI for cache-coherent masters; ACE-Lite for non-cached coherent agents (DMA, accelerators)Interconnects
AIBAdvanced Interface Bus — Intel-originated open chiplet D2D standard (1024 wires/channel); used in EMIB-based Sapphire Rapids/Ponte Vecchio; largely subsumed by UCIe AdvancedInterconnects
AXIAdvanced eXtensible Interface — Arm AMBA bus standard; AXI4 has 5 independent channels (AW/W/B/AR/R); AXI5 adds atomics and unique-ID interleaveInterconnects
BoWBunch of Wires — OCP/OIF chiplet D2D parallel-wire standard targeting < 2 mm; up to 16 GT/s/wire; largely subsumed by UCIeInterconnects
CHICoherent Hub Interface — Arm AMBA packet-based mesh fabric; scales to 256-core server chips (Neoverse N2/V2 CMN-700); supports snoopy + directory coherenceInterconnects
CPOCo-Packaged Optics — placing optical engines directly on switch ASIC substrate to eliminate PCB trace loss at 1.6T+; Broadcom Tomahawk 5/6, NVIDIA Quantum-X PhotonicsInterconnects
CQCompletion Queue — RDMA structure where NIC writes a CQE per completed Work Request; polled or interrupt-drivenInterconnects
DCBData Center Bridging — IEEE 802.1 extensions (PFC + ETS + QCN + DCBX) enabling lossless Ethernet for RoCE/FCoEInterconnects
DCBXData Center Bridging Exchange — LLDP-based protocol exchanging DCB capabilities/config between switch and endpointInterconnects
DCQCNDatacenter QCN — RoCEv2 congestion control combining switch ECN marking, CNP feedback, and rate adjustment at endpoint (Zhu SIGCOMM 2015)Interconnects
DCTDynamic Connected Transport — InfiniBand QP type using shared pool of QPs dynamically retargeted per peer; required for 10k+ rank scaleInterconnects
DCTCPDatacenter TCP (Alizadeh SIGCOMM 2010) — TCP variant using ECN with fractional marking + α-smoothing for low-latency DCInterconnects
ECMPEqual-Cost Multi-Path — routing technique distributing flows across multiple equal-cost paths via hash of packet fields; suffers hash collision under skewInterconnects
ETSEnhanced Transmission Selection (IEEE 802.1Qaz) — DCB feature for proportional bandwidth allocation across 8 traffic class groupsInterconnects
FCPFibre Channel Protocol — SCSI-over-FC mapping; the original SAN protocol; largely replaced by NVMe-oF/FC for new deploymentsInterconnects
FECForward Error Correction — channel coding (RS(528,514), RS(544,514), KR4) used in 25/50/100+ GbE to recover from bit errors; mandatory above 50G PAM4Interconnects
GFAMGlobal Fabric Attached Memory — CXL 3.0+ pooled coherent memory accessible by any host in a CXL fabric; sub-µs latency at TB scaleInterconnects
GMIGlobal Memory Interconnect — AMD on-package coherent interconnect linking CCDs to the IOD on EPYC; GMI3 at 36 GT/sInterconnects
HDM-DBHost-managed Device Memory — Device-managed coherence (CXL 3.0+) where device tracks host caches and issues back-invalidations; enables fabric-attached coherent memory pools >1 TBInterconnects
HPCCHigh Precision Congestion Control (Li SIGCOMM 2019) — in-band-telemetry-based CC for RDMA; per-hop queue + utilization embedded in packetsInterconnects
IBAInfiniBand Architecture — IBTA's full layered spec; covers physical, link, network, transport, and management layersInterconnects
ICIInter-Chip Interconnect — Google's TPU pod fabric; 3D torus with OCS reconfiguration in v4+ (Jouppi et al. ISCA 2023)Interconnects
IDEIntegrity and Data Encryption — CXL link-layer AES-GCM encryption per FLIT; selectable per virtual channelInterconnects
IFISInfinity Fabric Inter-Socket — AMD inter-socket coherent interconnect (xGMI variant); 32 GT/s at Zen 4Interconnects
IFOPInfinity Fabric On-Package — AMD on-package coherent link between CCD and IOD; 32-36 GT/s at Zen 4/5Interconnects
MACsecIEEE 802.1AE — L2 line-rate AES-128/256-GCM encryption between Ethernet hops; standard on enterprise/DC NICsInterconnects
MRMemory Region — RDMA registered+pinned+IOMMU-mapped buffer; has lkey (local) and rkey (remote) tokens; expensive to register (10s of ms per GB)Interconnects
MTUMaximum Transmission Unit — largest L2 frame supported; default 1500B Ethernet; "jumbo" 9000B common in DC; matters for PFC headroom + RoCEInterconnects
MZMMach-Zehnder Modulator — silicon-photonics modulator that splits light into two arms, applies electrical phase shift on one, recombines; output amplitude = cos²(Δφ/2)Interconnects
NCCLNVIDIA Collective Communications Library — GPU-native AllReduce/AllGather/Broadcast library; uses NVLink/IB/RoCE; supports NVLS in-network reductionInterconnects
NeuronLinkAWS Trainium proprietary interconnect; NeuronLink-v3 at ~12 Tbps aggregate per chip on Trainium2Interconnects
NIXLNVIDIA Inference Transfer Library (2024-2025) — disaggregated KV-cache transport for LLM serving; integrates Dynamo/vLLMInterconnects
NPIVN_Port ID Virtualization — Fibre Channel feature letting multiple virtual ports share one HBA; required for VM passthrough on FC SANsInterconnects
NRZNon-Return-to-Zero — binary signaling (1 bit/symbol); used in PCIe 1-5, Ethernet up to 25 Gbaud; superseded by PAM4 above 50 GbaudInterconnects
NVL72NVIDIA NVLink 72 — rack-scale architecture with 72 B200 GPUs in single coherent NVLink domain; 9 NVSwitch trays, 130 TB/s aggregate, copper backplaneInterconnects
NVLSNVLink Sharp — in-switch reduction on NVSwitch 3.0+; halves AllReduce bandwidth requirement vs ringInterconnects
NVMe-oFNVMe over Fabrics — NVMe wire protocol over RDMA (RoCE/IB), TCP, or FC; replaces iSCSI/FC for SSD-class storage networkingInterconnects
OCSOptical Circuit Switch — switch routing entirely in optical domain (MEMS mirrors or AWG); slow reconfig (ms), but very high BW/power efficiency once configuredInterconnects
ODPOn-Demand Paging — RDMA NIC feature replacing MR page-pinning with on-the-fly page faults via PCIe ATS+PRI; ~5-10 µs fault penaltyInterconnects
OFIOpenFabrics Interfaces — libfabric API and provider framework (verbs/EFA/psm3/cxi/tcp); alternative to UCX, preferred by AWS/Cray/Intel stacksInterconnects
OpenHBIOCP High Bandwidth Interface — chiplet D2D spec targeting HBM-class memory interconnect; largely overlapped by HBM PHY and UCIeInterconnects
PAM4Pulse Amplitude Modulation 4-level — 2 bits/symbol signaling; doubles baud-rate vs NRZ at cost of lower SNR; standard for 50G+ per-lane Ethernet/PCIe 6+Interconnects
PFCPriority-based Flow Control (IEEE 802.1Qbb) — pause only one of 8 traffic classes per port; required for lossless Ethernet (RoCEv2, FCoE)Interconnects
QCNQuantized Congestion Notification (IEEE 802.1Qau) — DCB explicit-feedback CC; largely superseded by ECN-based protocolsInterconnects
QPQueue Pair — RDMA endpoint pair (send queue + receive queue); types: RC, UC, UD, XRC, DCTInterconnects
RNRReceiver Not Ready — RDMA NAK indicating receiver had no posted RECV when SEND arrived; triggers sender backoff + retryInterconnects
RoCERDMA over Converged Ethernet — verbs over Ethernet (v1 L2-only, dead) or UDP/IP (v2, port 4791, dominant)Interconnects
RoCEv2RoCE version 2 — RDMA verbs encapsulated in UDP/IP; routable; requires lossless fabric (PFC) + ECN-based CC (DCQCN); UDP port 4791Interconnects
SerDesSerializer/Deserializer — high-speed parallel-to-serial signaling IP; the fundamental scaling unit (per-lane signaling) of all modern interconnectsInterconnects
SHARPScalable Hierarchical Aggregation and Reduction Protocol — Mellanox in-switch reduction for IB; halves AllReduce bandwidth requirementInterconnects
TDISPTEE Device Interface Security Protocol — PCIe spec (adopted by CXL) for attesting confidential devices; required for confidential CXL/PCIe accelerator workloadsInterconnects
TileLinkOpen RISC-V coherent chip protocol (UC Berkeley); three tiers TL-UL/TL-UH/TL-C; used in SiFive/BOOM/ChipyardInterconnects
UALinkUltra Accelerator Link — 2024 open consortium (AMD/Broadcom/Cisco/Google/Intel/Meta/MS/HPE); coherent NVLink alternative; targets 1024-GPU domains via Ethernet PHY + custom protocolInterconnects
UECUltra Ethernet Consortium — 2023-2025 Linux Foundation project; UEC 1.0 spec (Jun 2025) defines RUD/RUDI transport with packet spraying + modern CC for AI on commodity EthernetInterconnects
UPIUltra Path Interconnect — Intel inter-socket/inter-die coherent fabric (MESIF protocol); 10.4 GT/s (SKL) → 24 GT/s (GNR)Interconnects
WRWork Request — RDMA element posted to a QP's send or receive queue describing an I/O (opcode, sg_list, remote_addr/rkey, etc.)Interconnects
xGMIInter-Socket Global Memory Interconnect — AMD coherent link between EPYC sockets (and between MI300 GPUs); 32 GT/s at gen4-5Interconnects
ZRCoherent optical pluggable family — 400ZR/800ZR for metro distances (80-120 km unamplified) using DP-16QAM with integrated DSPInterconnects

Cache Eviction, Admission & Prefetching

TermDefinitionDetailed In
ARCAdaptive Replacement Cache — self-tuning O(1) policy with T1 (recency) + T2 (frequency) lists and B1/B2 ghost lists; adaptive partition point p updated on ghost hits; Megiddo & Modha FAST 2003; used in ZFS, DB2Cache Algorithms
CLOCK-ProApproximation of LIRS using three clock hands (hot/cold/test); scan-resistant without explicit ghost lists; lower memory overhead than ARC; Jiang et al. ATC 2005; used in NetBSD VMCache Algorithms
GDSFGreedy Dual Size Frequency — priority-queue eviction using key = (freq/size) + clock-inflation L; scan-resistant, size-aware; web proxy caching; Cherkasova HPL 1998; used in SquidCache Algorithms
GL-CacheGroup-level Learned Cache — ML model ranks groups of objects for batch eviction rather than individual objects; amortises inference cost; Yang et al. FAST 2023Cache Algorithms
LeCaRLearning Cache Replacement — online RL mixture of LRU + LFU experts via multiplicative-weights update; regret-minimising; Vietri et al. HotStorage 2018Cache Algorithms
LHDLeast Hit Density — evicts object with lowest estimated hits-per-byte-per-time-unit; sampled from random candidates; class-based histogram; Beckmann & Sanfilippo NSDI 2018Cache Algorithms
LIRSLow Inter-reference Recency Set — recency stack with HIR (high IRR) / LIR (low IRR) classification; promotes repeatedly-hit items; scan-resistant; Jiang & Zhang SIGMETRICS 2002; used in H2 DB, Caffeine (historical)Cache Algorithms
LRBLearning Relaxed Belady — GBDT model trained offline on request traces to predict next reuse time; approximates Belady's OPT; Song et al. NSDI 2020; deployed in Apache Traffic Server researchCache Algorithms
LRU-KLRU variant tracking K most-recent access timestamps per object; evicts object with oldest K-th reference; eliminates one-hit wonders; O'Neil et al. SIGMOD 1993Cache Algorithms
MGLRUMulti-Generational LRU — Linux 6.1+ page reclaim using hardware-assisted generation counters (PG_referenced + page table young bits); replaces clock-sweep for anonymous + file pages; Kuo LKML 2022Cache Algorithms
MRCMiss Ratio Curve — function mapping cache size → miss ratio; computed via reuse-distance analysis (exact) or SHARDS sampling (approximate); essential for cache sizing decisionsCache Algorithms
QD-LPQuick-Demotion Large-Protection — small FIFO filter for one-hit-wonder demotion (QD) + large LRU main region with lazy promotion (LP); Yang et al. HotOS 2023Cache Algorithms
S3-FIFOSmall/Slow/Sliding FIFO — three FIFO queues: S (10% capacity) + M (90%) + G (ghost); frequency bit in S promotes to M on second access; simple, scan-resistant, low metadata overhead; Yang et al. SOSP 2023Cache Algorithms
SHARDSSpatially Hashed Approximate Reuse Distance Sampling — O(1) amortised MRC construction via consistent hashing on object keys; 1% sampling rate with <1% miss ratio error; Waldspurger et al. FAST 2015Cache Algorithms
SIEVESIEVE eviction — single FIFO queue + hand pointer; visited bit cleared on first eviction pass (lazy demotion); simpler than LRU, competitive hit rate on CDN workloads; Zhang et al. NSDI 2024Cache Algorithms
SLRUSegmented LRU — two LRU segments: probationary (new entries) + protected (second-hit promoted); objects demoted from protected → probationary on eviction pressure; used as main cache in W-TinyLFUCache Algorithms
TinyLFUTiny LFU admission filter — 4-bit Count-Min Sketch frequency estimator + doorkeeper Bloom filter; admits new object only if frequency ≥ eviction candidate; reset-based aging; Einziger et al. 2017Cache Algorithms
W-TinyLFUWindow-TinyLFU — 1% window LRU + 99% SLRU main cache + TinyLFU admission gate + hill-climbing window-size tuner; production standard; used in Caffeine, Cassandra, Kafka, Solr, HBase, Neo4jCache Algorithms