engineering
architecture
benchmarks
deep-dive

Inside the Vallor Context Layer: Architecture, Performance, and Tradeoffs

A deep technical walkthrough of how Vallor builds and serves the contract intelligence layer. With benchmarks, code samples, and the boring details that actually matter.

Vallor Engineering· The teamMay 21, 20267 min read

Every contract intelligence product claims to be fast. Most are not. This post walks through how the Vallor Context Layer is actually built, the benchmarks we hold ourselves to, and the tradeoffs we made along the way.

The numbers up front

Here is the p50 and p99 latency for the four hot paths in our system, measured over the last 30 days of production traffic across 11 enterprise tenants:

800ms 600ms 400ms 200ms 0ms Clause lookup 120ms 160ms Counterparty fetch 280ms 400ms Redline generation 400ms 560ms Full review pass 600ms 760ms p50 p99
Production latency by operation, 30-day window, n = 4.1M requests.

Architecture at a glance

The system has three layers, each with a different scaling model:

Ingestion plane
Parses incoming contracts (DOCX, PDF, email attachments), extracts structure, and writes normalized entities to Postgres. Batched, eventually consistent, scales horizontally.
Context layer
The hot path. Holds the embedding index, the entity graph, and the per-tenant clause library. Lives in memory with a Postgres fallback. Read-heavy, latency-sensitive.
Agent plane
Stateless workers that call into the context layer to do review, redlining, and obligation extraction. Auto-scales on queue depth.

The hot path, in code

Every clause lookup goes through a single function. We've optimized this one function harder than anything else in the codebase:

async function lookupClause(
  tenantId: string,
  query: ClauseQuery,
): Promise<ClauseHit[]> {
  const cacheKey = hashQuery(tenantId, query);
  const cached = await contextCache.get(cacheKey);
  if (cached) return cached;

  const [embedHits, graphHits] = await Promise.all([
    embeddingIndex.search(tenantId, query.text, { k: 20 }),
    entityGraph.traverse(tenantId, query.entityIds, { depth: 2 }),
  ]);

  const ranked = rerank(embedHits, graphHits, query.context);
  await contextCache.set(cacheKey, ranked, { ttl: 60_000 });
  return ranked;
}

Three things matter here:

  1. The embedding search and graph traversal run in parallel. Sequencing them was the single biggest perf regression we ever shipped.
  2. The cache key is (tenantId, query), never just query. This sounds obvious. We still got it wrong once.
  3. The TTL is short (60s) because contract libraries get updated and stale clauses are worse than slow lookups.

Tradeoffs we made

Every system is a series of tradeoffs. Here are ours, with our work shown:

Why Postgres and not a dedicated vector DB?

We started on Pinecone. It worked. But the operational cost of running two stateful systems (Postgres for entities, Pinecone for embeddings) became real around 50M vectors per tenant. Moving to pgvector with an HNSW index cut p99 by 40% because we removed a network hop and could JOIN embeddings with entity metadata in a single query.

Would we make the same call at 500M vectors per tenant? Probably not. But that's a problem for next year.

Why is the context layer in-memory?

Because Postgres p99 reads under load were 80ms and we needed sub-10ms for the agent plane to feel responsive. We hold a hot subset (last 90 days of activity per tenant) in process memory, hydrate on cold start, and write through to Postgres on every mutation. The recovery story is a Postgres replay, which we test weekly.

Why not a multi-tenant single-process model?

Two reasons. Blast radius (a bad tenant query should not page another tenant's GC) and audit posture ("is tenant A's data ever in the same process as tenant B's data" is a question we get on every security review, and the answer we want to give is "no"). We pay for it in idle memory. Worth it.

What we got wrong

"The first version of this system tried to be too clever. The current version is boring. Boring is good."

— Founding engineer, on the v2 rewrite

Three things we'd do differently if we started over today:

  • Pick the boring database first. We spent two months on a custom storage layer. We threw it away. Postgres was always the answer.
  • Instrument before optimizing. Half of our early "optimizations" made the median case faster and the tail worse. We didn't notice until we shipped p99 dashboards.
  • Treat the context layer as a product, not a service. Once we gave it a name and a roadmap, every other team had a clearer place to push features. Before that, it was "the backend" and nobody owned it.

What's next

We're shipping three things in Q3 that build on this foundation: streaming review responses (so the GC sees the first redline in 200ms instead of waiting for the whole pass), cross-tenant benchmarking behind a privacy-preserving aggregate, and a programmatic context API for customers who want to build on top of the layer directly.

If any of this is interesting and you want to see it in action, book a walkthrough. We'll show you the dashboards too.

FAQ

What is the fastest way to evaluate Inside the Vallor Context Layer: Architecture, Performance, and Tradeoffs?

Start with one live workflow, one contract repository, and one measurable outcome. Vallor can connect to existing systems and produce first answers in minutes, which lets teams test value before a long rollout.

Does Vallor replace an existing CLM?

Not always. Vallor can sit on top of an existing CLM, ERP, storage drive, or email system. Some teams use it as the intelligence layer while keeping their current system of record.

How does Vallor keep answers audit-ready?

Every answer is grounded in source agreements and linked back to the clause, obligation, counterparty, or workflow record behind it. The goal is plain-English speed with enterprise evidence.

Who usually owns this work?

Procurement often owns the business case. Legal owns risk and redlines. Finance and sales operations join when obligations, rebates, renewals, or revenue contracts are in scope.

What data does Vallor need to start?

A contract folder, CLM export, ERP connection, or shared drive is enough for the first pass. Additional systems improve context, but they are not required to begin.