Vector-databases
-
How to Pick an Embedding Model (Without Overthinking It)
It’s easy to get deep into vector database comparisons, HNSW vs. IVF, pgvector vs. Pinecone, Qdrant vs. Chroma, and completely skip over the thing that actually matters most: the embedding model.
The way I think about it, the embedding model is the brain of your retrieval system. The vector database is just its filing cabinet. If the model creates poor mathematical representations of your data, no amount of indexing strategy or database performance is going to save you. You’ll get fast, confident, wrong results.
So let’s talk about how to pick a model.
Dimensionality: More Isn’t Always Better
Embeddings are high-dimensional vectors. Common sizes are 384, 768, 1536, or 3072 dimensions. Higher dimensions capture more nuance, but they also mean more storage, more memory, and slower search.
For a lean, local-first setup, something like
all-MiniLM-L6-v2at 384 dimensions gives you a surprisingly good balance of speed and accuracy. You don’t need 3072 dimensions to search your notes. Save the big vectors for when you actually have a reason.Sequence Length: The Silent Data Killer
Sequence length determines how much text the model can look at to create a single vector. If you’re embedding long technical docs or sprawling Markdown files and your model caps out at 512 tokens, it’s just truncating everything past that point. Your carefully written documentation gets chopped, and the embedding only represents the first few paragraphs.
Modern long-context embedding models handle 8k to 32k tokens, which lets you embed entire chapters or large code blocks as single semantic units. If your content is longer than a few paragraphs, check this number before anything else.
Domain Matters More Than You Think
General-purpose models like OpenAI’s
text-embedding-3-smallwork well across most tasks. They’ve been trained on massive, diverse datasets and they’re solid defaults.If you’re searching a codebase or technical documentation, models fine-tuned on programming languages (like
voyage-code-2) will outperform the general ones. The same applies to medical or legal text, where domain-specific jargon means the difference between a relevant result and a completely wrong one.Check MTEB Before You Commit
The Massive Text Embedding Benchmark (MTEB) is the industry standard for comparing models. It breaks performance into sub-categories like Retrieval, Summarization, and Clustering. If you’re building RAG, look at the Retrieval scores specifically. A model that ranks well for clustering might be mediocre at retrieval, and vice versa.
Local vs. API: Pick Your Tradeoff
This decision is as important as the model itself.
- Local models (via HuggingFace or Ollama) keep everything offline. Zero per-request costs, full privacy. Something like
bge-small-en-v1.5running locally is perfect for personal knowledge management or anything where your data shouldn’t leave your machine. - Hosted APIs (OpenAI, Voyage, Cohere) give you the highest performance and longest context windows without managing GPU infrastructure. Better for enterprise scale where you’re willing to trade privacy and recurring costs for accuracy.
Local models make sense for personal projects and hosted APIs make sense when the scale demands it. There’s no universal right answer, but there is a wrong one: picking a deployment model without thinking about where your data lives.
The vector database conversation is important, but it’s second in line to getting the embedding model right first. Everything downstream depends on it.
I’d appreciate a follow. You can subscribe with your email below. The emails go out once a week, or you can find me on Mastodon at @[email protected].
/ AI / Rag / Embeddings / Vector-databases
- Local models (via HuggingFace or Ollama) keep everything offline. Zero per-request costs, full privacy. Something like