Vector Databases, Explained Without the Hype

Two years ago, almost nobody outside a research lab had heard of a vector database. Today they are a default line item in every AI architecture diagram. Some of that adoption is earned and some is cargo-culting. It helps to understand what these systems genuinely do before you add one to your stack.

What a vector actually is

A modern embedding model turns a piece of text — or an image, or a snippet of audio — into a list of numbers, typically a few hundred to a few thousand of them. The useful property is that semantically similar inputs land close together in this high-dimensional space. “Cancel my subscription” and “how do I stop being billed” produce vectors that sit near each other, even though they share almost no words. Search becomes geometry: find the stored vectors nearest to the query vector.

The real problem is scale

Finding the nearest vector among a thousand is trivial — compare them all. Finding it among a hundred million, fast enough to answer a web request, is not. Exact nearest- neighbour search at that scale is too slow, so vector databases use approximate algorithms, most commonly a graph structure called HNSW, that trade a sliver of accuracy for enormous speed. That tradeoff — recall versus latency — is the single most important dial these systems expose, and most teams never touch it.

You may not need a separate database

Here is the part the marketing skips: many mature databases now do vector search competently. PostgreSQL with the pgvector extension, Elasticsearch, Redis, and several others can store and query embeddings alongside your existing data. If you have a few hundred thousand vectors and already run Postgres, adding pgvector is almost always the right first move. You keep one system, one backup strategy, and one set of operational knowledge.

A dedicated vector database — Pinecone, Weaviate, Qdrant, Milvus and the like — earns its place when you cross into tens of millions of vectors, need to filter and search in a single fast query, or want managed sharding and replication built for this one job. Reach for it when you have outgrown the extension, not before.

Filtering is where designs break

In practice you rarely want the nearest vector outright — you want the nearest vector that belongs to this customer or that was published this year. Combining similarity search with metadata filtering is deceptively hard: filter too late and you waste the index; filter too early and you wreck recall. How a given system handles this “pre-filter versus post-filter” question should drive your choice far more than any benchmark on its landing page.

Embeddings drift, and so must your index

The model that produced your vectors will be replaced. When it is, your old and new embeddings live in incompatible spaces and cannot be compared. Re-embedding a large corpus is a real migration with real cost, so plan for it: version your embeddings, keep the source text, and budget the recompute. A vector database is not a write-once artifact; it is a living index that ages with the model behind it.

Used deliberately, vector search is one of the most powerful tools in the modern data stack. Used reflexively, it is one more system to operate for capability you already had.

What a vector actually is

The real problem is scale

You may not need a separate database

Filtering is where designs break

Embeddings drift, and so must your index

Keep reading

The Real Economics of Running GPUs

MLOps Foundations: From Notebook to Reliable Service

Model Quantization: Smaller, Faster, Almost as Good