Vector Database Buzzwords Decoded: What Actually Matters When Choosing One

You’re building an AI feature that answers questions from your company’s documentation. With 50 documents, it works perfectly - send everything to the LLM, get instant answers.

The problem: Once your documents grow into the thousands of pages, you run into LLM context window limitations. Even though modern LLMs can technically handle 100–150 pages, their performance drops significantly with larger contexts.

The solution: “Use vector search and RAG.” You need a vector database.

You start researching and drown in unfamiliar terms:

“HNSW vs IVF algorithms”
“Cosine similarity vs Euclidean distance”
“Dense and sparse vectors”
“Hybrid storage architecture”

Every vendor claims to be the fastest and most accurate. But what do these terms mean? Which ones matter?

The truth: These buzzwords represent real trade-offs that determine whether your AI feature will be fast, accurate, and affordable at your scale.

This guide decodes the terms you’ll actually encounter when choosing a vector database, and shows you how each decision eliminates certain options before you even look at pricing.

Why Vector Databases: From Problem to Solution

You can’t send all the documents to an LLM because:

Quality degrades with too much context
Too expensive ($3-30 per query)
Too slow (10-60 seconds)

Solution: Send only relevant chunks.

New problem: How do you find relevant chunks when user asks “login problems” but docs say “authentication issues”? - Keywords don’t match.

Answer: Search by meaning, not keywords.

This requires:

Vector embeddings - Convert text to numbers capturing meaning
- “login problems” → [0.2, 0.8, 0.3, …]
- “authentication issues” → [0.25, 0.75, 0.35, …]
- Similar meanings = similar numbers
Vector database - Store millions of embeddings and find similar ones in milliseconds

Our path to vector databases:

Context limits → We need relevant chunks
To find relevant chunks → We search by meaning
To search by meaning → We need embeddings
To manage embeddings → We need storage (vector databases)

Buzzwords decoded:

Vector embeddings = Text as numbers capturing meaning
Embedding model = Creates embeddings (OpenAI, Cohere)
Vector database = Complete system for vector embeddings storage and search
Semantic search = Finding by meaning, not keywords

Core Decision #1: Embedding Strategy (What You’re Actually Storing)

Your embedding model choice determines everything else about your vector database requirements. This isn’t something you configure later - it’s baked into which embedding model you choose to convert your data into vectors.

Understanding Dense and Sparse Vectors

Dense vectors pack meaning into every number. Think of it like GPS coordinates with 1,536 dimensions - every number contributes to pinpointing the meaning.

"login problems" → [0.2, -0.1, 0.8, 0.3, -0.4, 0.1, ...]
All 1,536 numbers work together to capture semantic meaning

The AI model learns patterns during training. Words with similar meanings end up with similar number patterns. This is how “login problems” matches “authentication issues” despite zero word overlap - their vector patterns are similar.

Sparse vectors work like a checklist where most boxes are empty. Each position represents a specific word - if the word appears, it gets a value; if not, it stays zero.

"login problems" → [0, 0, 0.8, 0, 0, 0, 2.1, 0, ...]
Only positions for "login" and "problems" have values

This is essentially keyword search as numbers. “Login” only matches documents containing “login” - not “authentication.”

Selecting Your Embedding Model

Your use case determines which type of model you need:

Use dense models when:

Users ask questions in their own words (“Why can’t I log in?”)
You need semantic understanding (synonyms, paraphrases)
Most document search and Q&A applications
Example: OpenAI text-embedding-3-large, Sentence-Transformers

Use sparse models when:

Exact terminology matters (product codes, legal terms)
Users search with specific keywords they know
Rarely the sole choice in modern applications
Example: SPLADE, BM25

Use both (hybrid approach) when:

Need semantic understanding AND keyword precision
Technical documentation (concepts + exact terms matter)
Best results but requires using two separate models

Important: You cannot configure a dense model to output sparse vectors or vice versa. Dense/sparse is determined by which model you choose, not a setting you adjust.

Hybrid Search

Hybrid search means using TWO separate models:

Choose a dense model (e.g., Sentence-Transformers) → Generates dense vectors
Choose a sparse model (e.g., BM25) → Generates sparse vectors
Store both vector types for each document
Search both simultaneously and combine results

Why this matters: Not all databases support storing and searching both vector types. Hybrid requires database-level support for fusion.

Understanding Vector Dimensions: The Size Factor

Your embedding model determines how many numbers are in each vector. You don’t choose this - it’s fixed by the model.

Common dimension sizes:

384 dimensions: Lightweight models (~1.5KB per vector)
768 dimensions: Balanced models (~3KB per vector)
1,536 dimensions: New standard (~6KB per vector)
3,072 dimensions: Premium quality models (~12KB per vector)

Why dimensions matter for database selection:

Higher dimensions mean more storage and memory at every level:

1 million vectors at 1,536 dims = 6GB minimum storage
1 billion vectors at 3,072 dims = 12TB storage
Memory requirements = 2-3x the raw vector storage (for indexes and operations)

This directly impacts your infrastructure costs and eliminates certain database options:

High dimensions (3,072+) → Can’t use memory-only solutions at scale
Massive scale (billions of vectors) → Need distributed database architectures
Budget constraints → May need disk-based storage instead of in-memory

Buzzwords decoded:

Dense vectors = All numbers used, captures semantic meaning
Sparse vectors = Mostly zeros, like keyword matching
Dimensions/Vector length = How many numbers per vector (model decides)
Hybrid search = Using dense + sparse together
Scale = Total number of vectors (thousands, millions, billions)
Multimodal embeddings = Models that can convert text, images, and audio to vectors in the same space

Core Decision #2: Architecture (Libraries, Databases, or Search Engines)

When researching vector solutions, you’ll encounter Faiss (a library) and Elasticsearch (a search engine) alongside vector databases. These represent fundamentally different approaches - understanding them clarifies which path fits your needs.

Three Architectural Approaches

Vector Libraries: Just Algorithms, You Build the Infrastructure

Examples: Faiss, Annoy, ScaNN

What you get: Similarity search algorithms
What you need: Storage, backups, scaling, monitoring, reliability

When to choose: You need custom algorithms or specific optimizations not available in databases. You’re building a unique system where standard databases won’t work.

Vector Databases: Complete Systems Purpose-Built for Vectors

Examples: Pinecone, Qdrant, Weaviate, Chroma

What you get: Storage, search, scaling, backups, monitoring - everything

When to choose: You need vector search as part of a larger application. You want production reliability without building infrastructure. This covers 90% of use cases.

Vector Search Engines: Traditional Search with Vector Capabilities

Examples: Elasticsearch, OpenSearch, Solr with vector plugins

What you get: Keyword search AND vector search in one system
What you need: Integration with existing search

When to choose: You already run these systems and need to add semantic search. You need both keyword and vector search tightly integrated.

Why This Matters

Your architecture choice eliminates ~30% of options immediately and determines what the remaining decisions mean:

Libraries → Focus on algorithm selection and infrastructure design
Databases → Focus on feature comparison and operational trade-offs
Search engines → Focus on integration and hybrid search quality

The rest of this blog assumes you’re evaluating vector databases and decodes the specific buzzwords you’ll encounter in that space.

Buzzwords decoded:

Vector library = Code library providing algorithms (Faiss, Annoy)
Vector search engine = Traditional search extended with vectors (Elasticsearch + knn plugin)
Vector index = The pre-organized data structure that enables fast similarity search

Core Decision #3: Storage Architecture (Speed vs Cost)

How your vectors are physically stored determines your speed, cost, and scale limits. This is often the biggest cost factor in vector database selection.

Where Your Vectors Actually Live

In-Memory Storage - Vectors stored in RAM for instant access.

Speed: Sub-millisecond queries (fastest possible)
Risk: Data lost on crash unless replicated
Best for: Small datasets (<1M vectors) needing extreme speed
Database examples: Redis with vector search

Disk-Based Storage - Vectors stored on SSDs or hard drives.

Speed: 5-50 millisecond queries (still very fast)
Advantage: Data persists through crashes and restarts
Best for: Large datasets (100M+ vectors) with reasonable speed needs
Database examples: Most vector databases’ default mode

Hybrid Storage - Hot data in memory, cold data on disk automatically.

Speed: Fast for popular queries, slower for rare ones
Complexity: Database handles optimization automatically
Best for: Large datasets with uneven access patterns
Database examples: Qdrant, Weaviate with smart caching

The Hidden 3.5x Storage Reality

When you calculate storage needs (vector dimensions × number of vectors), you’re only seeing raw vector size. Actual production requirements are much higher:

Complete storage breakdown:

Raw vectors: 1x (what you calculated)
Search indexes: 1x (HNSW graphs, IVF clusters - required for fast search)
Backups and replicas: 1x (for reliability and disaster recovery)
Working memory: 0.5x (for query processing and operations)

Total: ~3.5x your raw vector storage

Example:

You calculate: 1M vectors × 1,536 dims = 6GB
Production reality: 21GB minimum
With high availability: 30GB recommended

Why this matters: Some databases are more index-efficient than others. HNSW indexes are larger but faster; IVF indexes are smaller but need more computation. This affects both storage costs and performance.

Buzzwords decoded:

Persistence = Data survives system restarts and crashes
In-memory / RAM storage = Fastest storage, data lost on restart unless replicated
Disk-based storage = Slower but reliable, data persists through crashes
Hybrid storage = Automatic optimization between memory and disk
Replication = Keeping multiple copies of data for reliability
Sharding / Partitioning = Splitting data across multiple servers for scale
Million-scale / Billion-scale = Marketing terms indicating data size capabilities

Core Decision #4: Search Algorithms (Speed vs Accuracy Trade-off)

Finding similar vectors fast requires two pieces: an algorithm (organizes vectors for search) and a distance metric (measures similarity). Your embedding model determines the distance metric. The database usually chooses the algorithm.

Why Algorithms Matter

Without algorithms, searching means comparing your query to every stored vector:

1M vectors = 500ms-2 seconds per search
10M vectors = 5-20 seconds per search
Users expect under 100ms

Algorithms solve this by pre-organizing vectors once, then using shortcuts during search. Trade-off: 10-50ms speed but might miss 5% of results. This 95% accuracy is acceptable for most applications.

Algorithm and Distance Metric Relationship

Algorithm = How vectors are organized and searched (HNSW, IVF, DiskANN)
Distance metric = How similarity is calculated (cosine, Euclidean, dot product)

They’re independent but must work together:

Your embedding model dictates distance metric
Database chooses algorithm based on its design
Algorithm must support your required distance metric

The Main Algorithm Types

Most vector databases use one of these algorithm families. You often can’t choose - the database picks for you. But understanding them helps you evaluate performance claims and cost trade-offs.

HNSW (Hierarchical Navigable Small World)

How it works: Builds a multi-layer graph connecting similar vectors, like a social network where you find people through mutual connections.

Speed: Very fast (10-50ms for millions of vectors)
Accuracy: Excellent (95-99% recall)
Memory: High (stores all the connections)

IVF (Inverted File Index)

How it works: Groups vectors into clusters (like organizing books by genre), then only searches relevant clusters.

Speed: Fast (20-100ms)
Accuracy: Good (85-95% recall)
Memory: Moderate (stores cluster info)

LSH (Locality Sensitive Hashing)

How it works: Uses hashing to bucket similar vectors together, like sorting by first letter.

Speed: Very fast (constant time in ideal cases)
Accuracy: Moderate (70-85% recall)
Memory: Low (minimal overhead)

DiskANN (Disk-Based ANN)

How it works: Optimized for SSD storage, using compressed graphs and smart caching to achieve good performance without loading everything into memory.

Speed: Good (50-200ms, faster than expected for disk-based)
Accuracy: Very good (90-95% recall)
Memory: Very low (designed for disk, minimal RAM usage)

Distance Metrics: Measuring “Similarity”

Your embedding model determines your distance metric:

Cosine Similarity measures the angle between vectors, ignoring their length.

Think of it as: Two arrows pointing in similar directions are “similar” regardless of their length.

Use for: Text embeddings
Why: “cat” and “the big fluffy cat” should be similar despite different text lengths
Required by: OpenAI, Sentence-Transformers, most text models

Euclidean Distance measures straight-line distance between vectors.

Think of it as: Physical distance on a map - both direction AND distance matter.

Use for: Image embeddings, spatial data
Why: Color intensity matters (light red vs dark red are different)
Required by: Some computer vision models

Dot Product measures both angle and magnitude.

Think of it as: Similarity plus strength - both direction and intensity matter.

Use for: Recommendation systems
Why: Strong preferences should rank higher than weak preferences
Required by: Some recommendation embedding models

How This Affects Database Choice

Your embedding model eliminates databases:

Need cosine → Most databases support this well
Need Euclidean or dot product → Verify support quality
Wrong metric support → Broken search results

Your scale and speed needs determine algorithm preference:

Need <50ms with 10M+ vectors → Requires HNSW
Memory-constrained at scale → IVF or DiskANN better
Billions of vectors → DiskANN or distributed HNSW

Buzzwords decoded:

ANN (Approximate Nearest Neighbor) = Finding “close enough” matches quickly (vs checking everything)
kNN (k-Nearest Neighbors) = Finding the k closest matches (can be exact or approximate)
Distance metric (cosine, Euclidean, dot product) = Mathematical formula for measuring similarity
Brute force / Flat index = Checking every vector (perfect accuracy, unusably slow at scale)
Query latency = Time from search request to results (targeting <100ms)
QPS (Queries Per Second) = How many searches the system can handle concurrently
Recall = What percentage of relevant results the system finds (95% recall = finds 95 out of 100 relevant docs)
Quantization = Compressing vectors to save memory (trades accuracy for efficiency)
HNSW, IVF, LSH = Different algorithm families (usually chosen by database, not you)
DiskANN = Emerging disk-optimized algorithm balancing cost and performance for massive scale

Core Decision #5: Metadata Filtering (Beyond Simple Similarity)

Real applications rarely need just “find similar vectors.” They need “find similar AND meets criteria”:

“Similar BUT only API documentation”
“Similar BUT only user can access”
“Similar BUT only last 30 days”

Without filtering, irrelevant results appear regardless of similarity quality.

What Is Metadata Filtering

Metadata = Structured data attached to vectors (categories, dates, user IDs, prices, permissions)

Filtering = Similarity search + metadata constraints

Example metadata structure:

{
  "vector": [0.2, 0.8, 0.3, ...],
  "metadata": {
      "category": "api_docs",
      "date": "2024-01-15",
      "access": "public"
  }
}

Query example: “Find similar to ‘authentication error’ WHERE category=‘api_docs’ AND date > ‘2024-01-01‘“

What Determines Your Filtering Needs

Your application requirements:

Simple filtering (most cases):

Category, date, status filters
Basic equality and ranges
Database impact: All modern databases handle this

Complex filtering:

Multi-tenant isolation (user_id, org_id)
Highly selective filters (match <5% of data)
Permission-based access
Database impact: Performance varies significantly

Understanding selectivity:

Filter selectivity = percentage of data matching your filter
High selectivity (>50% match) = easy for databases to handle
Low selectivity (<5% match) = performance challenge, requires optimization

How Databases Handle Filtering

Databases decide the internal strategy - not something you configure:

Pre-filtering: Apply filters first, search within matches
Post-filtering: Search first, filter results after
Hybrid: Automatically optimize based on filter selectivity

What you control: Which fields to filter on, filter conditions
What database controls: How filtering executes internally

What to Evaluate

Filter language support:

Does it support AND/OR logic?
Range queries (>, <, BETWEEN)?
Nested fields (user.department)?
Your specific data types (dates, arrays, geo)?

Performance characteristics:

Test with YOUR filter selectivity (what % of data matches?)
Verify speed at YOUR scale (1M vs 100M vs 1B vectors)
Check behavior with complex combined filters

Example: If filtering for “documents updated in last 7 days” matches only 2% of your 10M vectors, test this exact scenario - not just general search speed. This reveals how the database handles selective filters at your scale.

Integration quality:

Does filtering slow down similarity search?
Can metadata fields be indexed for speed?
How does it handle filters matching <1% of data?

Important: Don’t just benchmark vector search speed. Test filtering performance with your actual filter patterns and data scale.

Buzzwords decoded:

Metadata/Payload = Structured data per vector
Metadata filtering = Search with constraints
Selective filter = Matches <5% of data
Filter selectivity = Percentage of data matching filter
Pre/Post/Hybrid filtering = Internal strategies (database-controlled)
Filter indexing = Optimization for metadata lookups

Quick Decision Framework

Answer These First

1. What embedding model are you using?

Determines: Dense/sparse, dimensions, distance metric
Eliminates: 40% of databases

2. How much control do you need?

Determines: Library vs database vs search engine
Eliminates: 30% of remaining options

3. What’s your scale?

Determines: Memory vs disk vs hybrid storage
Eliminates: 20% of remaining options

4. What’s your speed/accuracy requirement?

Determines: Algorithm choices
Eliminates: 10% of remaining options

5. Do you need filtering?

Determines: Advanced vs basic databases
Eliminates: Final options

The Bottom Line

These buzzwords aren’t marketing fluff - they represent fundamental trade-offs:

Speed vs Accuracy (algorithms)
Cost vs Performance (storage)
Control vs Convenience (architecture)
Features vs Simplicity (capabilities)

Most importantly: Your choice of embedding model and your scale requirements eliminate most options before you even start evaluating databases. Start there, and the rest becomes much clearer.

The secret: There’s no single “best” vector database. There’s only the best choice for your specific combination of requirements. Now you know how to decode the buzzwords and make the right decision.