API

What we learnt building the Parts Resolve API

1 May 20264 min read

We did not start out planning to ship a public API. We built one because the first three trade buyers who integrated with us all asked, independently, for the same three things. This is what shipping that API taught us about parts data, and what we would do differently if we started again tomorrow.

Lesson 1: Part numbers are not strings

Every API we shipped early on treated a part number as a string parameter. Customers immediately broke that assumption. They sent us numbers with spaces, with dashes in different places, with leading zeros stripped, with brand prefixes attached, with manufacturer suffixes attached, in lowercase, in mixed-case, with a trailing comma where their CSV export had leaked one in.

Treating a part number as a string puts the burden of normalisation on the caller, and the caller is, sensibly, never going to do it. The API now runs every incoming number through the same normaliser the internal search uses, and we return both what the caller sent and the canonical form we resolved it to. That single change cut our "no match" rate by about a third.

POST /api/v1/resolve
{
  "query": "03l 906 051A,"
}

200 OK
{
  "input": "03l 906 051A,",
  "normalised": "03L906051A",
  "canonical_group_id": "grp_8f3...",
  "matches": [ ... ]
}

Lesson 2: Evidence has to travel with the match

A cross-reference without a source is a rumour. The first version of our resolve endpoint returned a list of equivalent numbers and nothing else. The first integration partner asked, on day two, "where did you get this?" That question never goes away.

Every edge in our graph carries one or more evidence assets — a manufacturer URL, a catalogue snippet, a supplier confirmation, a workshop manual page. The API surfaces them inline on every match. Buyers who care can render them straight in their UI; buyers who do not can ignore them. Either way, the evidence is not hidden behind a separate call.

"matches": [
  {
    "part_number": "0281002996",
    "brand": "Bosch",
    "confidence": 0.98,
    "evidence": [
      {
        "kind": "manufacturer_url",
        "url": "https://...",
        "extracted": "Fits VAG 03L906051A (CR engines)"
      }
    ]
  }
]

Lesson 3: Batch and stream are different products

We initially shipped a single resolve endpoint and assumed callers would loop. Two patterns emerged that we hadn’t designed for.

The first was DMS overnight reconciliation: a national distributor wanted to push 80,000 part numbers through us once a night, get back the canonical groups, and update their internal mapping table. A per-call API at human latency was the wrong shape entirely. We built a batch endpoint that accepts up to 5,000 numbers in a single request, runs the normaliser and resolver in bulk, and returns a single response. Wall-clock time for an 80k reconciliation dropped from "an hour of HTTP" to "ninety seconds of one call repeated sixteen times".

The second was the trade-counter use case: a parts advisor on the phone to a workshop, typing characters into a search box, expecting results back inside 200ms. Per-call latency mattered far more than throughput. We tuned a dedicated low-latency path for single-number resolution, with aggressive caching on canonical groups and supplier stock snapshots. P95 sits at 140ms.

Lesson 4: Versioning needs a deprecation contract

We version under /v1/ and have a public deprecation policy: any breaking change ships under a new major version, both run side by side for a minimum of nine months, and we email every API key holder with usage on the old version monthly until they cut over. That policy was not in place at the start, and not having it cost us trust the first time we needed to evolve a response shape.

Lesson 5: A confidence score is not optional

Every match returns a confidence in [0, 1]. Buyers use it to decide whether to auto-route to a purchase order or to drop into a human review queue. Without it, every match looked equally trustworthy, and one bad cross-reference would erode confidence in the whole API. We will write more about how the confidence model itself works in a separate post.