Inside the Vallor Context Layer: Architecture, Performance, and Tradeoffs

A deep technical walkthrough of how Vallor builds and serves the contract intelligence layer. With benchmarks, code samples, and the boring details that actually matter.

Every contract intelligence product claims to be fast. Most are not. This post walks through how the Vallor Context Layer is actually built, the benchmarks we hold ourselves to, and the tradeoffs we made along the way.

The numbers up front

Here is the p50 and p99 latency for the four hot paths in our system, measured over the last 30 days of production traffic across 11 enterprise tenants:

Production latency by operation, 30-day window, n = 4.1M requests.

Architecture at a glance

The system has three layers, each with a different scaling model:

Ingestion plane: Parses incoming contracts (DOCX, PDF, email attachments), extracts structure, and writes normalized entities to Postgres. Batched, eventually consistent, scales horizontally.
Context layer: The hot path. Holds the embedding index, the entity graph, and the per-tenant clause library. Lives in memory with a Postgres fallback. Read-heavy, latency-sensitive.
Agent plane: Stateless workers that call into the context layer to do review, redlining, and obligation extraction. Auto-scales on queue depth.

The hot path, in code

Every clause lookup goes through a single function. We've optimized this one function harder than anything else in the codebase:

async function lookupClause(
  tenantId: string,
  query: ClauseQuery,
): Promise<ClauseHit[]> {
  const cacheKey = hashQuery(tenantId, query);
  const cached = await contextCache.get(cacheKey);
  if (cached) return cached;

  const [embedHits, graphHits] = await Promise.all([
    embeddingIndex.search(tenantId, query.text, { k: 20 }),
    entityGraph.traverse(tenantId, query.entityIds, { depth: 2 }),
  ]);

  const ranked = rerank(embedHits, graphHits, query.context);
  await contextCache.set(cacheKey, ranked, { ttl: 60_000 });
  return ranked;
}

Three things matter here:

The embedding search and graph traversal run in parallel. Sequencing them was the single biggest perf regression we ever shipped.
The cache key is (tenantId, query), never just query. This sounds obvious. We still got it wrong once.
The TTL is short (60s) because contract libraries get updated and stale clauses are worse than slow lookups.

Tradeoffs we made

Every system is a series of tradeoffs. Here are ours, with our work shown:

Why Postgres and not a dedicated vector DB?

We started on Pinecone. It worked. But the operational cost of running two stateful systems (Postgres for entities, Pinecone for embeddings) became real around 50M vectors per tenant. Moving to pgvector with an HNSW index cut p99 by 40% because we removed a network hop and could JOIN embeddings with entity metadata in a single query.

Would we make the same call at 500M vectors per tenant? Probably not. But that's a problem for next year.

Why is the context layer in-memory?

Because Postgres p99 reads under load were 80ms and we needed sub-10ms for the agent plane to feel responsive. We hold a hot subset (last 90 days of activity per tenant) in process memory, hydrate on cold start, and write through to Postgres on every mutation. The recovery story is a Postgres replay, which we test weekly.

Why not a multi-tenant single-process model?

Two reasons. Blast radius (a bad tenant query should not page another tenant's GC) and audit posture ("is tenant A's data ever in the same process as tenant B's data" is a question we get on every security review, and the answer we want to give is "no"). We pay for it in idle memory. Worth it.

What we got wrong

"The first version of this system tried to be too clever. The current version is boring. Boring is good."

— Founding engineer, on the v2 rewrite

Three things we'd do differently if we started over today:

Pick the boring database first. We spent two months on a custom storage layer. We threw it away. Postgres was always the answer.
Instrument before optimizing. Half of our early "optimizations" made the median case faster and the tail worse. We didn't notice until we shipped p99 dashboards.
Treat the context layer as a product, not a service. Once we gave it a name and a roadmap, every other team had a clearer place to push features. Before that, it was "the backend" and nobody owned it.

What's next

We're shipping three things in Q3 that build on this foundation: streaming review responses (so the GC sees the first redline in 200ms instead of waiting for the whole pass), cross-tenant benchmarking behind a privacy-preserving aggregate, and a programmatic context API for customers who want to build on top of the layer directly.

If any of this is interesting and you want to see it in action, book a walkthrough. We'll show you the dashboards too.

FAQ

Why did Vallor move from a dedicated vector database to pgvector?

Vallor started on Pinecone, but running two stateful systems, Postgres for entities and Pinecone for embeddings, got costly around 50M vectors per tenant. Moving to pgvector with an HNSW index cut p99 by 40%, because it removed a network hop and let embeddings JOIN with entity metadata in a single query.

How is one tenant's data kept separate from another's?

Every clause lookup is scoped by tenant. The embedding search, the graph traversal, and the cache key all take the tenant ID, and Vallor runs a per-tenant process rather than a shared one. That limits blast radius and answers the question of whether tenant A's data ever shares a process with tenant B's with a clear no.

Why is the context layer held in memory instead of read straight from Postgres?

Postgres p99 reads under load were 80ms, and the agent plane needed sub-10ms to feel responsive. Vallor keeps the last 90 days of activity per tenant in process memory, writes through to Postgres on every change, and recovers by replaying from Postgres, which is tested weekly.

How does Vallor keep lookups from returning stale clauses?

The lookup cache uses a short 60-second TTL, because contract libraries get updated and a stale clause is worse than a slightly slower lookup. Embedding search and graph traversal also run in parallel to keep latency low.

Inside the Vallor Context Layer: Architecture, Performance, and Tradeoffs

The numbers up front

Architecture at a glance

The hot path, in code

Tradeoffs we made

What we got wrong

What's next

FAQ

Put your contracts on autopilot.

Try It Yourself: An Interactive Tour of Contract Review Math

The Hidden Cost of Manual Contract Review: A 2026 Field Report

The Enterprise Buyer's Guide to AI Contract Management