Vector Database Buzzwords Decoded: What Actually Matters When Choosing One
You’re building an AI feature that answers questions from your company’s documentation. With 50 documents, it works perfectly - send everything to the LLM, get instant answers.
The problem: Once your documents grow into the thousands of pages, you run into LLM context window limitations. Even though modern LLMs can technically handle 100–150 pages, their performance drops significantly with larger contexts.
The solution: “Use vector search and RAG.” You need a vector database.
You start researching and drown in unfamiliar terms:
- “HNSW vs IVF algorithms”
- “Cosine similarity vs Euclidean distance”
- “Dense and sparse vectors”
- “Hybrid storage architecture”
Every vendor claims to be the fastest and most accurate. But what do these terms mean? Which ones matter?
The truth: These buzzwords represent real trade-offs that determine whether your AI feature will be fast, accurate, and affordable at your scale.
This guide decodes the terms you’ll actually encounter when choosing a vector database, and shows you how each decision eliminates certain options before you even look at pricing.
Why Vector Databases: From Problem to Solution
You can’t send all the documents to an LLM because:
- Quality degrades with too much context
- Too expensive ($3-30 per query)
- Too slow (10-60 seconds)
Solution: Send only relevant chunks.
New problem: How do you find relevant chunks when user asks “login problems” but docs say “authentication issues”? - Keywords don’t match.
Answer: Search by meaning, not keywords.
This requires:
-
Vector embeddings - Convert text to numbers capturing meaning
- “login problems” → [0.2, 0.8, 0.3, …]
- “authentication issues” → [0.25, 0.75, 0.35, …]
- Similar meanings = similar numbers
-
Vector database - Store millions of embeddings and find similar ones in milliseconds
Our path to vector databases:
- Context limits → We need relevant chunks
- To find relevant chunks → We search by meaning
- To search by meaning → We need embeddings
- To manage embeddings → We need storage (vector databases)
Buzzwords decoded:
- Vector embeddings = Text as numbers capturing meaning
- Embedding model = Creates embeddings (OpenAI, Cohere)
- Vector database = Complete system for vector embeddings storage and search
- Semantic search = Finding by meaning, not keywords
Core Decision #1: Embedding Strategy (What You’re Actually Storing)
Your embedding model choice determines everything else about your vector database requirements. This isn’t something you configure later - it’s baked into which embedding model you choose to convert your data into vectors.
Understanding Dense and Sparse Vectors
Dense vectors pack meaning into every number. Think of it like GPS coordinates with 1,536 dimensions - every number contributes to pinpointing the meaning.
"login problems" → [0.2, -0.1, 0.8, 0.3, -0.4, 0.1, ...]
All 1,536 numbers work together to capture semantic meaning
The AI model learns patterns during training. Words with similar meanings end up with similar number patterns. This is how “login problems” matches “authentication issues” despite zero word overlap - their vector patterns are similar.
Sparse vectors work like a checklist where most boxes are empty. Each position represents a specific word - if the word appears, it gets a value; if not, it stays zero.
"login problems" → [0, 0, 0.8, 0, 0, 0, 2.1, 0, ...]
Only positions for "login" and "problems" have values
This is essentially keyword search as numbers. “Login” only matches documents containing “login” - not “authentication.”
Selecting Your Embedding Model
Your use case determines which type of model you need:
Use dense models when:
- Users ask questions in their own words (“Why can’t I log in?”)
- You need semantic understanding (synonyms, paraphrases)
- Most document search and Q&A applications
- Example: OpenAI text-embedding-3-large, Sentence-Transformers
Use sparse models when:
- Exact terminology matters (product codes, legal terms)
- Users search with specific keywords they know
- Rarely the sole choice in modern applications
- Example: SPLADE, BM25
Use both (hybrid approach) when:
- Need semantic understanding AND keyword precision
- Technical documentation (concepts + exact terms matter)
- Best results but requires using two separate models
Important: You cannot configure a dense model to output sparse vectors or vice versa. Dense/sparse is determined by which model you choose, not a setting you adjust.
Hybrid Search
Hybrid search means using TWO separate models:
- Choose a dense model (e.g., Sentence-Transformers) → Generates dense vectors
- Choose a sparse model (e.g., BM25) → Generates sparse vectors
- Store both vector types for each document
- Search both simultaneously and combine results
Why this matters: Not all databases support storing and searching both vector types. Hybrid requires database-level support for fusion.
Understanding Vector Dimensions: The Size Factor
Your embedding model determines how many numbers are in each vector. You don’t choose this - it’s fixed by the model.
Common dimension sizes:
- 384 dimensions: Lightweight models (~1.5KB per vector)
- 768 dimensions: Balanced models (~3KB per vector)
- 1,536 dimensions: New standard (~6KB per vector)
- 3,072 dimensions: Premium quality models (~12KB per vector)
Why dimensions matter for database selection:
Higher dimensions mean more storage and memory at every level:
- 1 million vectors at 1,536 dims = 6GB minimum storage
- 1 billion vectors at 3,072 dims = 12TB storage
- Memory requirements = 2-3x the raw vector storage (for indexes and operations)
This directly impacts your infrastructure costs and eliminates certain database options:
- High dimensions (3,072+) → Can’t use memory-only solutions at scale
- Massive scale (billions of vectors) → Need distributed database architectures
- Budget constraints → May need disk-based storage instead of in-memory
Buzzwords decoded:
- Dense vectors = All numbers used, captures semantic meaning
- Sparse vectors = Mostly zeros, like keyword matching
- Dimensions/Vector length = How many numbers per vector (model decides)
- Hybrid search = Using dense + sparse together
- Scale = Total number of vectors (thousands, millions, billions)
- Multimodal embeddings = Models that can convert text, images, and audio to vectors in the same space
Core Decision #2: Architecture (Libraries, Databases, or Search Engines)
When researching vector solutions, you’ll encounter Faiss (a library) and Elasticsearch (a search engine) alongside vector databases. These represent fundamentally different approaches - understanding them clarifies which path fits your needs.
Three Architectural Approaches
Vector Libraries: Just Algorithms, You Build the Infrastructure
Examples: Faiss, Annoy, ScaNN
What you get: Similarity search algorithms
What you need: Storage, backups, scaling, monitoring, reliability
When to choose: You need custom algorithms or specific optimizations not available in databases. You’re building a unique system where standard databases won’t work.
Vector Databases: Complete Systems Purpose-Built for Vectors
Examples: Pinecone, Qdrant, Weaviate, Chroma
What you get: Storage, search, scaling, backups, monitoring - everything
When to choose: You need vector search as part of a larger application. You want production reliability without building infrastructure. This covers 90% of use cases.
Vector Search Engines: Traditional Search with Vector Capabilities
Examples: Elasticsearch, OpenSearch, Solr with vector plugins
What you get: Keyword search AND vector search in one system
What you need: Integration with existing search
When to choose: You already run these systems and need to add semantic search. You need both keyword and vector search tightly integrated.
Why This Matters
Your architecture choice eliminates ~30% of options immediately and determines what the remaining decisions mean:
- Libraries → Focus on algorithm selection and infrastructure design
- Databases → Focus on feature comparison and operational trade-offs
- Search engines → Focus on integration and hybrid search quality
The rest of this blog assumes you’re evaluating vector databases and decodes the specific buzzwords you’ll encounter in that space.
Buzzwords decoded:
- Vector library = Code library providing algorithms (Faiss, Annoy)
- Vector search engine = Traditional search extended with vectors (Elasticsearch + knn plugin)
- Vector index = The pre-organized data structure that enables fast similarity search
Core Decision #3: Storage Architecture (Speed vs Cost)
How your vectors are physically stored determines your speed, cost, and scale limits. This is often the biggest cost factor in vector database selection.
Where Your Vectors Actually Live
In-Memory Storage - Vectors stored in RAM for instant access.
- Speed: Sub-millisecond queries (fastest possible)
- Risk: Data lost on crash unless replicated
- Best for: Small datasets (<1M vectors) needing extreme speed
- Database examples: Redis with vector search
Disk-Based Storage - Vectors stored on SSDs or hard drives.
- Speed: 5-50 millisecond queries (still very fast)
- Advantage: Data persists through crashes and restarts
- Best for: Large datasets (100M+ vectors) with reasonable speed needs
- Database examples: Most vector databases’ default mode
Hybrid Storage - Hot data in memory, cold data on disk automatically.
- Speed: Fast for popular queries, slower for rare ones
- Complexity: Database handles optimization automatically
- Best for: Large datasets with uneven access patterns
- Database examples: Qdrant, Weaviate with smart caching
The Hidden 3.5x Storage Reality
When you calculate storage needs (vector dimensions × number of vectors), you’re only seeing raw vector size. Actual production requirements are much higher:
Complete storage breakdown:
- Raw vectors: 1x (what you calculated)
- Search indexes: 1x (HNSW graphs, IVF clusters - required for fast search)
- Backups and replicas: 1x (for reliability and disaster recovery)
- Working memory: 0.5x (for query processing and operations)
Total: ~3.5x your raw vector storage
Example:
- You calculate: 1M vectors × 1,536 dims = 6GB
- Production reality: 21GB minimum
- With high availability: 30GB recommended
Why this matters: Some databases are more index-efficient than others. HNSW indexes are larger but faster; IVF indexes are smaller but need more computation. This affects both storage costs and performance.
Buzzwords decoded:
- Persistence = Data survives system restarts and crashes
- In-memory / RAM storage = Fastest storage, data lost on restart unless replicated
- Disk-based storage = Slower but reliable, data persists through crashes
- Hybrid storage = Automatic optimization between memory and disk
- Replication = Keeping multiple copies of data for reliability
- Sharding / Partitioning = Splitting data across multiple servers for scale
- Million-scale / Billion-scale = Marketing terms indicating data size capabilities
Core Decision #4: Search Algorithms (Speed vs Accuracy Trade-off)
Finding similar vectors fast requires two pieces: an algorithm (organizes vectors for search) and a distance metric (measures similarity). Your embedding model determines the distance metric. The database usually chooses the algorithm.
Why Algorithms Matter
Without algorithms, searching means comparing your query to every stored vector:
- 1M vectors = 500ms-2 seconds per search
- 10M vectors = 5-20 seconds per search
- Users expect under 100ms
Algorithms solve this by pre-organizing vectors once, then using shortcuts during search. Trade-off: 10-50ms speed but might miss 5% of results. This 95% accuracy is acceptable for most applications.
Algorithm and Distance Metric Relationship
Algorithm = How vectors are organized and searched (HNSW, IVF, DiskANN)
Distance metric = How similarity is calculated (cosine, Euclidean, dot product)
They’re independent but must work together:
- Your embedding model dictates distance metric
- Database chooses algorithm based on its design
- Algorithm must support your required distance metric
The Main Algorithm Types
Most vector databases use one of these algorithm families. You often can’t choose - the database picks for you. But understanding them helps you evaluate performance claims and cost trade-offs.
HNSW (Hierarchical Navigable Small World)
How it works: Builds a multi-layer graph connecting similar vectors, like a social network where you find people through mutual connections.
- Speed: Very fast (10-50ms for millions of vectors)
- Accuracy: Excellent (95-99% recall)
- Memory: High (stores all the connections)
IVF (Inverted File Index)
How it works: Groups vectors into clusters (like organizing books by genre), then only searches relevant clusters.
- Speed: Fast (20-100ms)
- Accuracy: Good (85-95% recall)
- Memory: Moderate (stores cluster info)
LSH (Locality Sensitive Hashing)
How it works: Uses hashing to bucket similar vectors together, like sorting by first letter.
- Speed: Very fast (constant time in ideal cases)
- Accuracy: Moderate (70-85% recall)
- Memory: Low (minimal overhead)
DiskANN (Disk-Based ANN)
How it works: Optimized for SSD storage, using compressed graphs and smart caching to achieve good performance without loading everything into memory.
- Speed: Good (50-200ms, faster than expected for disk-based)
- Accuracy: Very good (90-95% recall)
- Memory: Very low (designed for disk, minimal RAM usage)
Distance Metrics: Measuring “Similarity”
Your embedding model determines your distance metric:
Cosine Similarity measures the angle between vectors, ignoring their length.
Think of it as: Two arrows pointing in similar directions are “similar” regardless of their length.
- Use for: Text embeddings
- Why: “cat” and “the big fluffy cat” should be similar despite different text lengths
- Required by: OpenAI, Sentence-Transformers, most text models
Euclidean Distance measures straight-line distance between vectors.
Think of it as: Physical distance on a map - both direction AND distance matter.
- Use for: Image embeddings, spatial data
- Why: Color intensity matters (light red vs dark red are different)
- Required by: Some computer vision models
Dot Product measures both angle and magnitude.
Think of it as: Similarity plus strength - both direction and intensity matter.
- Use for: Recommendation systems
- Why: Strong preferences should rank higher than weak preferences
- Required by: Some recommendation embedding models
How This Affects Database Choice
Your embedding model eliminates databases:
- Need cosine → Most databases support this well
- Need Euclidean or dot product → Verify support quality
- Wrong metric support → Broken search results
Your scale and speed needs determine algorithm preference:
- Need <50ms with 10M+ vectors → Requires HNSW
- Memory-constrained at scale → IVF or DiskANN better
- Billions of vectors → DiskANN or distributed HNSW
Buzzwords decoded:
- ANN (Approximate Nearest Neighbor) = Finding “close enough” matches quickly (vs checking everything)
- kNN (k-Nearest Neighbors) = Finding the k closest matches (can be exact or approximate)
- Distance metric (cosine, Euclidean, dot product) = Mathematical formula for measuring similarity
- Brute force / Flat index = Checking every vector (perfect accuracy, unusably slow at scale)
- Query latency = Time from search request to results (targeting <100ms)
- QPS (Queries Per Second) = How many searches the system can handle concurrently
- Recall = What percentage of relevant results the system finds (95% recall = finds 95 out of 100 relevant docs)
- Quantization = Compressing vectors to save memory (trades accuracy for efficiency)
- HNSW, IVF, LSH = Different algorithm families (usually chosen by database, not you)
- DiskANN = Emerging disk-optimized algorithm balancing cost and performance for massive scale
Core Decision #5: Metadata Filtering (Beyond Simple Similarity)
Real applications rarely need just “find similar vectors.” They need “find similar AND meets criteria”:
- “Similar BUT only API documentation”
- “Similar BUT only user can access”
- “Similar BUT only last 30 days”
Without filtering, irrelevant results appear regardless of similarity quality.
What Is Metadata Filtering
Metadata = Structured data attached to vectors (categories, dates, user IDs, prices, permissions)
Filtering = Similarity search + metadata constraints
Example metadata structure:
{
"vector": [0.2, 0.8, 0.3, ...],
"metadata": {
"category": "api_docs",
"date": "2024-01-15",
"access": "public"
}
}
Query example: “Find similar to ‘authentication error’ WHERE category=‘api_docs’ AND date > ‘2024-01-01‘“
What Determines Your Filtering Needs
Your application requirements:
Simple filtering (most cases):
- Category, date, status filters
- Basic equality and ranges
- Database impact: All modern databases handle this
Complex filtering:
- Multi-tenant isolation (user_id, org_id)
- Highly selective filters (match <5% of data)
- Permission-based access
- Database impact: Performance varies significantly
Understanding selectivity:
- Filter selectivity = percentage of data matching your filter
- High selectivity (>50% match) = easy for databases to handle
- Low selectivity (<5% match) = performance challenge, requires optimization
How Databases Handle Filtering
Databases decide the internal strategy - not something you configure:
Pre-filtering: Apply filters first, search within matches
Post-filtering: Search first, filter results after
Hybrid: Automatically optimize based on filter selectivity
What you control: Which fields to filter on, filter conditions
What database controls: How filtering executes internally
What to Evaluate
Filter language support:
- Does it support AND/OR logic?
- Range queries (>, <, BETWEEN)?
- Nested fields (user.department)?
- Your specific data types (dates, arrays, geo)?
Performance characteristics:
- Test with YOUR filter selectivity (what % of data matches?)
- Verify speed at YOUR scale (1M vs 100M vs 1B vectors)
- Check behavior with complex combined filters
Example: If filtering for “documents updated in last 7 days” matches only 2% of your 10M vectors, test this exact scenario - not just general search speed. This reveals how the database handles selective filters at your scale.
Integration quality:
- Does filtering slow down similarity search?
- Can metadata fields be indexed for speed?
- How does it handle filters matching <1% of data?
Important: Don’t just benchmark vector search speed. Test filtering performance with your actual filter patterns and data scale.
Buzzwords decoded:
- Metadata/Payload = Structured data per vector
- Metadata filtering = Search with constraints
- Selective filter = Matches <5% of data
- Filter selectivity = Percentage of data matching filter
- Pre/Post/Hybrid filtering = Internal strategies (database-controlled)
- Filter indexing = Optimization for metadata lookups
Quick Decision Framework
Answer These First
1. What embedding model are you using?
- Determines: Dense/sparse, dimensions, distance metric
- Eliminates: 40% of databases
2. How much control do you need?
- Determines: Library vs database vs search engine
- Eliminates: 30% of remaining options
3. What’s your scale?
- Determines: Memory vs disk vs hybrid storage
- Eliminates: 20% of remaining options
4. What’s your speed/accuracy requirement?
- Determines: Algorithm choices
- Eliminates: 10% of remaining options
5. Do you need filtering?
- Determines: Advanced vs basic databases
- Eliminates: Final options
The Bottom Line
These buzzwords aren’t marketing fluff - they represent fundamental trade-offs:
- Speed vs Accuracy (algorithms)
- Cost vs Performance (storage)
- Control vs Convenience (architecture)
- Features vs Simplicity (capabilities)
Most importantly: Your choice of embedding model and your scale requirements eliminate most options before you even start evaluating databases. Start there, and the rest becomes much clearer.
The secret: There’s no single “best” vector database. There’s only the best choice for your specific combination of requirements. Now you know how to decode the buzzwords and make the right decision.