Store and search high-dimensional embeddings for AI and similarity search.
If you are new here: A vector here is a long list of numbers (e.g. 768 floats) that represents meaning of text, images, or audio — produced by an embedding model. A vector database finds the closest stored vectors to a query vector — “nearest neighbor search” in high dimensions.
| Step | What happens |
|---|---|
| 1. Ingest | Chunk documents → call embedding API → store vector + metadata |
| 2. Query | Embed user question → find k nearest vectors |
| 3. Use | Pass chunks to an LLM (RAG) or rank search results |
Traditional search matches keywords. Users ask ”how do we scale reads?” — documents that never contain that exact phrase might still be relevant. Embeddings turn text into vectors so we can search by meaning using distance in high-dimensional space.
In plain terms: a vector database stores points in meaning-space and answers “what is closest?” fast — the backbone of semantic search and many RAG apps.
Tiny example: The phrases “scale reads” and “handle more traffic” might be close in vector space even though they share no words.
The query becomes a vector; we want the k stored vectors closest by cosine or Euclidean distance — “nearest neighbors” in hundreds or thousands of dimensions.
Exact brute force over billions of vectors is too slow. ANN indexes trade a little accuracy for massive speed and smaller memory — common structures include HNSW (a graph where similar vectors are linked as neighbors) and IVF (which clusters the space into cells so search only checks nearby cells). You tune recall (what fraction of true nearest neighbors you find) vs latency.
Build support docs, internal wikis, or catalogs where users ask in natural language — the vector index finds chunks that read similarly even if the wording differs.
Retrieval-augmented generation: fetch context from the vector DB, paste into the LLM prompt, generate an answer — reduces hallucinations when the corpus is grounded.
Sketch:
User question → embed → top-k chunks from vector DB → prompt + chunks → LLM → answer
Combine filters (tenant id, date range, SKU) with vector ranking — narrow with structured queries, rank with embeddings.
Pinecone, Weaviate, Qdrant, Chroma, pgvector in Postgres — pick managed vs self-hosted, SQL affinity, and latency SLOs.
| You still own | The DB does not fix |
|---|---|
| Chunking, permissions, evals | Bad source documents |
| Embedding refresh when content changes | Hallucinations if context is wrong |
Vector search is one layer in an AI stack: you still need fresh embeddings, chunking strategy, permissions, and evaluation — the database does not fix bad content.
Next: Full-Text Search is the keyword-search counterpart — in practice, production search systems often combine both: BM25 for exact matches, vector similarity for semantic ranking.
A model maps text or images into a dense vector — nearby vectors mean semantically similar content in embedding space.