040 · EMBEDDINGS · SIMILARITY · AI

Vector Database

Store and search high-dimensional embeddings for AI and similarity search.

If you are new here: A vector here is a long list of numbers (e.g. 768 floats) that represents meaning of text, images, or audio — produced by an embedding model. A vector database finds the closest stored vectors to a query vector — “nearest neighbor search” in high dimensions.

Step	What happens
1. Ingest	Chunk documents → call embedding API → store vector + metadata
2. Query	Embed user question → find k nearest vectors
3. Use	Pass chunks to an LLM (RAG) or rank search results

The Problem

Traditional search matches keywords. Users ask ”how do we scale reads?” — documents that never contain that exact phrase might still be relevant. Embeddings turn text into vectors so we can search by meaning using distance in high-dimensional space.

In plain terms: a vector database stores points in meaning-space and answers “what is closest?” fast — the backbone of semantic search and many RAG apps.

Tiny example: The phrases “scale reads” and “handle more traffic” might be close in vector space even though they share no words.

Similarity search

The query becomes a vector; we want the k stored vectors closest by cosine or Euclidean distance — “nearest neighbors” in hundreds or thousands of dimensions.

Approximate nearest neighbor (ANN)

Exact brute force over billions of vectors is too slow. ANN indexes trade a little accuracy for massive speed and smaller memory — common structures include HNSW (a graph where similar vectors are linked as neighbors) and IVF (which clusters the space into cells so search only checks nearby cells). You tune recall (what fraction of true nearest neighbors you find) vs latency.

Semantic retrieval

Build support docs, internal wikis, or catalogs where users ask in natural language — the vector index finds chunks that read similarly even if the wording differs.

RAG pattern

Retrieval-augmented generation: fetch context from the vector DB, paste into the LLM prompt, generate an answer — reduces hallucinations when the corpus is grounded.

Sketch:

User question → embed → top-k chunks from vector DB → prompt + chunks → LLM → answer

Hybrid search

Combine filters (tenant id, date range, SKU) with vector ranking — narrow with structured queries, rank with embeddings.

Ecosystem

Pinecone, Weaviate, Qdrant, Chroma, pgvector in Postgres — pick managed vs self-hosted, SQL affinity, and latency SLOs.

Trade-offs

You still own	The DB does not fix
Chunking, permissions, evals	Bad source documents
Embedding refresh when content changes	Hallucinations if context is wrong

Why this matters for you

Vector search is one layer in an AI stack: you still need fresh embeddings, chunking strategy, permissions, and evaluation — the database does not fix bad content.

Next: Full-Text Search is the keyword-search counterpart — in practice, production search systems often combine both: BM25 for exact matches, vector similarity for semantic ranking.

DIAGRAMDrag nodes · pan · pinch or double-click to zoom

FRAME 1 OF 8

A model maps text or images into a dense vector — nearby vectors mean semantically similar content in embedding space.