Lessons From Building SaaS Integrations
CSV imports, API syncing, and webhook handling are the unglamorous core of SaaS — here's what I've learned building them
Every SaaS product eventually becomes an integration product. You build the core feature, users love it, and then the first real question arrives: "Can I get my data in from X?" or "Does this sync with Y?" or "Can you fire an event when Z happens?"
The answer is almost always yes. The interesting part is everything that follows.
I've spent a lot of time building integrations — CSV imports for onboarding, API syncs to pull in third-party data, webhook handlers to react to external events. This is not glamorous work. Nobody writes blog posts about their elegant CSV parser. But it's the kind of work that determines whether a product actually fits into someone's workflow or just sits next to it.
Here's what I've learned.
CSV Import Is Never Just CSV Import
CSV is the universal interchange format because it requires zero coordination. No API keys, no OAuth flows, no documentation. The user exports from one system, imports into yours, done.
Except it's never done.
The first problem is encoding. Files arrive as UTF-8, Windows-1252, ISO-8859-1, and occasionally something that defies classification. A single mishandled character in a name field corrupts a row. You need to detect encoding or at minimum fail gracefully with a message that helps the user fix it.
The second problem is structure. Users don't read your import template. They upload whatever they have, with whatever column names their previous tool used, in whatever order made sense to them at the time. For Composer Catalog, where we handle music metadata and rights data, this means files where "Composer" might be labeled "Writer," "Artist," "Creator," "Author," or just left blank because it was obvious from context. Mapping columns manually is table stakes. Building a UI that makes that mapping feel easy is real product work.
The third problem is validation. You can check types and required fields. You should. But the harder validation is business logic — things like duplicate detection, relational integrity, values that are individually valid but collectively wrong. A track duration of 0:00 is technically parseable. It's also almost certainly an error.
The pattern I've landed on: parse first, validate second, commit third. Never try to import and validate in a single pass. Build a preview step that surfaces errors before anything touches the database. Let users fix problems before they become your problem.
And always, always support partial success. If a user uploads 500 rows and 12 have errors, importing the 488 good rows is almost always better than rejecting the whole file. Give them a clear error report for the failures and move on.
API Syncing Is a Relationship, Not a Transaction
Calling a third-party API once is easy. Keeping your data in sync with a third-party API over months and years is a different problem entirely.
The mechanics are straightforward on paper: fetch data, store it, schedule the next fetch, handle errors. In practice, every third-party API has its own quirks. Pagination strategies differ — some use cursor-based pagination, some use offset/limit, some use link headers, some just return everything and hope you can handle it. Rate limits are inconsistently documented and occasionally enforced in ways that don't match the documentation. Auth tokens expire, sometimes predictably, sometimes not.
For integrations I've built against music rights databases and metadata services, the bigger challenge is semantic drift. The API returns data in a format you map to your schema. Six months later, the API adds a field, removes one, or changes the meaning of an existing one without changing the structure. Your sync still runs. Your data is now quietly wrong.
The mitigations aren't complicated, but they require discipline. Log raw API responses, not just what you extracted from them. Set up alerts when sync jobs return significantly more or fewer records than expected — that's often the first signal something changed upstream. Version your field mappings explicitly so you can see what changed when something breaks.
Idempotency matters here too. If a sync job runs twice because of a retry, you want the result to be identical to running it once. Build your upsert logic accordingly.
Webhook Handling Is an Optimism Problem
Webhooks are appealing because they're push-based. Instead of polling for changes, the other system tells you when something happens. Faster, more efficient, cleaner.
They're also harder to make reliable than they look.
The fundamental issue is that webhook delivery is fire-and-forget from the sender's perspective. They POST to your endpoint, they log a 200, they move on. Whether you actually processed the event is your problem. Networks fail. Deployments cause brief downtime. Your handler throws an exception. The sender may or may not retry.
The pattern I use: accept the webhook immediately, return 200, and enqueue the actual processing. Don't do real work in the handler. If processing fails, you have a queue you can retry from. If the sender retries the delivery anyway, your processing needs to be idempotent — handle duplicate events without creating duplicate state.
Signature verification is non-negotiable. Every serious webhook provider gives you a shared secret and a signature header. Verify it before you do anything else. An unverified webhook endpoint is an unauthenticated write endpoint to your database.
The other thing nobody mentions: you will receive events out of order. An "updated" event may arrive before the "created" event that should have preceded it, especially at any meaningful volume. Design your handlers to be resilient to this. Check whether the referenced resource exists before acting. If it doesn't, decide whether to wait for it, create it defensively, or discard the event.
The Common Thread
What CSV imports, API syncing, and webhook handling share is this: they all involve data you don't fully control arriving in formats you didn't entirely specify. The instinct is to write validation that rejects anything unexpected. The better instinct is to write systems that degrade gracefully — that surface problems clearly, recover when they can, and tell you when they can't.
Integration work is where products meet reality. It's worth doing carefully.