- Prompt/Deploy
- Posts
- How I'd Choose a Vector Database
How I'd Choose a Vector Database
The tutorial picks one for you. Here's the decision logic for choosing a vector database in production — so you can eliminate the wrong options before they become expensive ones.

This post is part of the Production RAG series, covers the decisions, failure modes, and operational concerns that surface when RAG moves out of prototype and into real systems — the gap between "I got it working" and "I can own this in production."
The tutorial picks one for you. Pinecone in one guide, Qdrant in the next, pgvector in the one after that. None of them tell you what happens when you go to production, hit 10M vectors, and realize the choice you made in week one is now the thing slowing you down at week forty.
The vector database decision has real production consequences — cost, latency, operational overhead, and vendor lock-in. Most teams don't feel those consequences until they're already expensive to reverse.
This post walks through the decision logic, not a comparison table. The goal is to identify your non-negotiable constraints upfront and eliminate options, rather than rank features you might never use.
Why the Wrong Choice Reveals Itself Late
Vector database mistakes follow a consistent pattern. You make the choice early, under low stakes, based on what the tutorial or demo used. It works fine in the prototype. You ship it. Then, months later, something breaks or costs spike — and you trace it back to that early choice.
The wrong choice reveals itself in a few predictable ways:
Under load: index builds compete with production queries; latency spikes at concurrent requests
During billing review: per-query or per-vector costs that looked negligible at 10K vectors become structural at 100M
When requirements evolve: you add multi-tenancy, need metadata filtering, or try to add hybrid search — and find out your index doesn't support it efficiently
The implication: you need to do the constraint thinking upfront, before the choice is embedded in your stack.
Decision One: pgvector or a Dedicated Vector Database?
For most teams starting a RAG system, the first fork in the road is pgvector — the Postgres extension that adds vector search to a database you probably already have.
pgvector is a reasonable choice when all of the following are true simultaneously. (If you're using pgvectorscale — Timescale's extension — the performance ceiling is higher and some of the scale thresholds below are less binding. The capability limitations around filtering and hybrid search still apply.)
Your vector count stays under roughly 1 million (comfortable range); 5-10M is the practical ceiling before index builds and memory pressure become visible
You don't need accurate metadata filtering — you're searching across the full collection
Your embeddings are tightly coupled to relational data and queried together
You don't need hybrid search (dense similarity + keyword/BM25)
Postgres is already central to your application
Your team is small and values infrastructure simplicity over search performance
Qdrant's analysis reviewed 110+ community threads and found that a typical B2B SaaS product adding search hits disqualifiers almost immediately. (Worth noting:
Qdrant sells a pgvector alternative, so treat their failure-case analysis with appropriate skepticism).
Multi-tenant filtering: The moment you need to scope searches to a user, organization, or tenant, you need selective metadata filtering. pgvector's ANN search runs first, then filters — which means when filters are highly selective, you get recall degradation. Dedicated vector databases build filtering into the index itself, using adaptive pre-filter strategies that reduce recall degradation under selective filtering — though extreme selectivity (filtering out 99%+ of vectors) still requires careful index tuning regardless of the database you choose.
Hybrid search: If you need both dense vector similarity and keyword matching (common for document search, product search, anything where exact terms matter), pgvector requires you to build this yourself. Dedicated databases like Qdrant and Weaviate have native hybrid search that combines dense and sparse retrieval with built-in fusion.
Multi-vector architectures: More complex embedding strategies — like ColBERT-style late interaction — can turn 100K documents into tens of millions of vectors. This compresses pgvector's comfortable range dramatically.
The sync concern is real: keeping a dedicated vector store in sync with Postgres adds operational complexity. Dual-writes, transactional outbox patterns, and CDC pipelines are established approaches with known failure modes — it's well-understood engineering, though not zero-effort.
Do you need metadata filtering for specific tenants or categories?
→ Yes: Use a dedicated vector database
Do you need hybrid search (dense + keyword)?
→ Yes: Use a dedicated vector database
Do you expect multi-vector architectures (ColBERT, multi-modal)?
→ Yes: Use a dedicated vector database
Are you under 5M vectors with simple Postgres integration?
→ Yes: pgvector is reasonable — monitor index builds and latency at scale
→ No: Use a dedicated vector database

ℹ️ Note: The scale threshold is often cited as the pgvector limit, but it's usually the capability threshold that hits first. Most teams that outgrow pgvector do so because of filtering requirements, not raw vector count.
Decision Two: Managed or Self-Hosted?
Once you've decided you need a dedicated vector database, the next question is where it runs.
The managed-vs-self-hosted decision is really about where you spend your operational budget.
Managed services (Pinecone, Zilliz/managed Milvus, Qdrant Cloud, Weaviate Cloud) give you zero ops overhead. You don't manage nodes, backups, upgrades, or failover. You pay for what you use and scale without intervention.
Self-hosted (Milvus, Qdrant, Weaviate open-source) moves the cost from vendor invoices to engineering time. You own cluster management, node failures, backup schedules, upgrades, and sharding configuration. Scaling from one to three nodes requires careful tuning of sharding and replica settings.
It's cheaper in direct infrastructure costs but carries real operational overhead that makes it more expensive at small scale. The crossover happens roughly around 50-100M vectors, or around $500/month in cloud spend.
Above that threshold, self-hosted tends to be more economical:
High-frequency workloads exceeding 60-80M queries/month can reach a point where self-hosted undercuts managed pricing by 3-10x, depending on instance type and access patterns
Milvus with DiskANN can store roughly 10x more vectors on cheap SSD storage compared to RAM-based indexes, significantly reducing the effective cost per million vectors at large scale
Below that threshold, managed is often the right call — especially for small teams. The DevOps time to operate a production vector cluster is real engineering work.
Managed | Self-Hosted | |
|---|---|---|
Ops overhead | Near zero | Significant |
Cost at small scale | Higher unit price, lower total | Infrastructure + engineering time |
Cost at large scale | Can become prohibitive (per-query billing) | Cheaper with right hardware |
Vendor lock-in | High (proprietary APIs) | Low (open-source, portable) |
Good for | Small teams, early scale | DevOps-capable teams, 50M+ vectors |

Hidden costs of managed to watch for: data egress fees (typically $0.08-0.09/GB on AWS), index rebuild compute charges, and per-query or per-read-unit pricing that scales linearly with throughput. Pinecone's serverless billing, for example, can go from reasonable to structurally expensive at sustained high-throughput workloads.
The deciding question: Does your team have the DevOps capacity to operate a production cluster? If yes, self-hosting at scale is usually the right economic call. If not, the managed premium is worth paying.
ℹ️ A practical trigger for re-evaluating: If your managed bill is approaching $500/month and growing, run the numbers on self-hosted before you hit the next pricing tier. That's the right time to evaluate — not after you've already scaled to 100M vectors.
What Benchmarks Actually Tell You
When evaluating vector databases, you'll find benchmarks from ANN-Benchmarks, from Qdrant, from Milvus, and from various blog posts. Most of these don't reflect how the system will behave under your production conditions — especially at concurrent load.
Standard ANN benchmarks measure single-query latency in isolation — recall versus queries-per-second on a single node with no concurrent load, no update pressure, and test hardware that may not match yours. What they miss:
Concurrent query saturation: In production, you have multiple queries running simultaneously. At 20+ concurrent queries, nominal QPS often stays flat while latency surges due to CPU saturation. One documented test on a 10M vector dataset showed 85ms average latency but 420ms P99 — a 5x gap that would never show up in a single-query benchmark.
Update overhead: If your data changes frequently, index rebuilds compete with production queries in some systems. Benchmarks typically test static datasets.
Your actual data distribution: Benchmark datasets (GloVe, SIFT, GIST) may have different dimensionality and distribution characteristics than your embeddings. Performance varies.
Production requirements are typically stated as constraints: "3,000 QPS at p95 latency ≤ 20ms." Standard benchmarks test neither the concurrency nor the percentile latency that matters.
Use benchmarks to rule out obvious misfits, but don't commit based on them. Test with your actual embedding model, your vector count, and load that approximates your expected concurrent query volume.
The Constraints That Actually Determine the Right Choice
Compress the decision into five constraints. Identify which ones are non-negotiable for your system, then eliminate options that can't meet them.

Scale
Under 1M vectors: pgvector is a reasonable starting point, assuming capability gates don't trigger first
1-10M vectors: pgvector is possible, but watch for filtering requirements and index build pressure
10M+ vectors: if you haven't already migrated due to capability requirements, this is the point to evaluate dedicated options
Filtering requirements
Full-collection search only → pgvector is viable
Selective metadata filtering (tenant, category, date) → dedicated database with pre-filter ANN
Operational budget
Small team, no dedicated infra capacity → managed service
Team with DevOps capacity + 50M+ vectors → self-hosted typically wins on cost
Query throughput
Under 60-80M queries/month → managed pricing is usually acceptable
Sustained high-throughput → run the numbers on self-hosted; the crossover is often significant
Migration tolerance
Low tolerance for being wrong (high switching costs, tight timelines) → be conservative; skip pgvector even if technically viable
High tolerance → pgvector is a reasonable bet with known exit conditions
Don't try to optimize for all five simultaneously. Identify which two or three are genuinely non-negotiable for your use case, use those to eliminate options, then pick based on operational fit from what remains.
When constraints point in different directions — say, low vector count but real filtering requirements — let capability constraints win. Scale catches up with you; retrofitting proper pre-filter ANN into pgvector after you've built around it is harder than migrating early.
What Happens When You Choose Wrong
Vector database migrations are harder than they look. There's no standardized data format across systems, and most migration tooling is built for relational databases, not vector stores. One documented Pinecone → Zilliz migration was estimated at 6-8 hours; actual time was 14-16 hours, consistent with a pattern of SDK quirks, configuration subtleties, and integration testing that vendor documentation doesn't surface.
You typically don't need to re-embed — your embeddings are portable across systems as long as you use the same embedding model. The cost is in re-integration: updating client code, testing metadata filtering behavior, validating recall against your baseline, and re-tuning HNSW parameters for the new system.
One useful mitigation: abstract the vector store behind an interface early. If your application code talks to a VectorStore interface rather than directly to Pinecone's SDK, migration is narrower in scope — though it's not a silver bullet. Filter expression syntax, schema differences, and recall validation will still differ between systems and will leak through any abstraction that handles complex queries.
Making the Choice You Can Defend Later
The goal isn't to pick the database with the most features or the best benchmark. It's to pick the one that matches your non-negotiable constraints and doesn't create operational debt you're not prepared to carry.
Most teams should start with pgvector if they're under 1M vectors with simple retrieval requirements. It's the right call if the conditions hold — just know the exit conditions before you commit. Most teams that outgrow pgvector do so because of filtering or multi-tenancy requirements, not raw scale.
For dedicated databases, the managed-vs-self-hosted question is really about team structure. If you have DevOps capacity and are approaching 50M vectors, the economics of self-hosting are usually compelling. If you're a lean team focused on product, the managed premium buys real time.
The decisions you make now are cheap to change when your dataset is small. They become expensive later — which is the argument for doing the constraint thinking before the choice is embedded in your stack.
Further Reading
Start with pgvector: Why You'll Outgrow It Faster Than You Think — Qdrant's analysis of 110+ community threads on pgvector failure conditions
ANN-Benchmarks — The canonical benchmark; useful for understanding the recall/QPS tradeoff between algorithms
VDBBench — A more production-realistic benchmark framework with concurrent load and streaming ingestion
Reply