Five System-Level Decisions I'd Make the Same Way Again

Over time, I've learned that good system design isn't about getting everything right up front. It's about making a small number of decisions that continue to make sense as systems, teams, and expectations grow.

These aren't low-level design choices about patterns or topologies. They're the kinds of system-level decisions that quietly determine whether architecture remains workable as organisations and teams evolve.

These are five such decisions I'd make again, not because they were perfect, but because they reduced ambiguity and held up under real pressure.

1. Keeping the API Gateway Thin

In many systems, the API gateway starts out with a clear purpose: routing requests, handling authentication, and enforcing basic policies.

As traffic grows, it slowly becomes something else.

Validation logic creeps in. Business rules follow. Special cases get added "just for now."

Each change feels reasonable on its own. Over time, the gateway turns into a central brain that everyone depends on and no one feels comfortable touching.

The problem isn't technical. It's structural.

When too much logic lives in one place, every change becomes risky, coordination-heavy, and slow. Debugging also gets harder, because failures no longer belong clearly to any one service.

Keeping the gateway thin doesn't mean avoiding control. It means keeping responsibility close to ownership. Business logic stays with the services that own it. Failures become easier to reason about. Teams regain the ability to move independently.

Thin gateways don't look impressive in diagrams, but they tend to age far better than "smart" ones.

2. Assigning Explicit Data Ownership Early

Data ownership often starts as an assumption.

Multiple services read the same data. Some write to it "occasionally." Everyone agrees to be careful.

At small scale, this works. At larger scale, it creates hesitation.

When something breaks, no one is quite sure who should act. When a schema needs to change, everyone worries about who might be affected. Changes slow down, not because the system is fragile, but because confidence is.

Over time, that hesitation becomes more damaging than any single technical flaw.

Making data ownership explicit early feels restrictive. It forces hard conversations. It limits flexibility.

But it also creates clarity.

When one service owns a piece of data, decisions become simpler. Accountability is obvious. Teams can change what they own without fear. Incidents resolve faster because responsibility isn't distributed across guesswork.

This decision trades short-term convenience for long-term confidence and that's a trade I'd make again.

3. Choosing Fewer, Boring Building Blocks

As systems evolve, there's constant pressure to adopt new tools, frameworks, and patterns. Each promises better performance, scalability, or developer experience.

The hidden cost isn't in adoption, it's in operation.

Every new building block adds:

learning overhead
new failure modes
operational complexity
harder on-call scenarios

Choosing fewer, well-understood tools simplifies more than just the codebase. It simplifies conversations, hiring, incident response, and long-term maintenance.

"Boring" doesn't mean outdated. It means predictable.

Predictable systems fail in ways you've already seen. They're easier to debug at inconvenient times and easier for new engineers to understand. Over years, that reliability compounds quietly.

This decision optimises for longevity rather than novelty.

4. Designing for Reversibility Where Possible

Not all system-level decisions carry the same weight.

Some are easy to change later. Others lock you in for years.

The mistake I've seen repeatedly is treating all decisions with equal seriousness, overthinking small choices while rushing irreversible ones.

A simple question helps:

How painful is this to undo if we're wrong?

Reversible decisions can be made quickly. Irreversible ones deserve patience and evidence.

This approach prevents premature complexity without slowing progress. It keeps options open longer and avoids committing to assumptions before the system earns them.

It's not indecision, it's disciplined timing.

5. Prioritising Observability Before Optimisation

When systems feel slow or unstable, the instinct is to optimise immediately: cache more, scale infrastructure, tune performance.

Without visibility, this is mostly guesswork.

Observability changes the nature of decision-making. Logs, metrics, and traces make behaviour visible. They show where time is spent, which dependencies fail, and how the system behaves under real load.

Once teams can see what's happening, optimisations become smaller, safer, and more effective. Conversations shift from opinion to evidence. Trust increases, not just in the system, but between people.

Optimisation without observability increases risk. Optimisation with observability reduces it.

Closing Thought

None of these decisions are particularly clever.

What they share is a bias toward clarity, ownership, and choices that reduce uncertainty rather than hide it.

Systems will grow and change whether we plan for it or not. The goal isn't to eliminate complexity, it's to make sure complexity stays understandable as it accumulates.

These are the kinds of decisions that help systems and teams age well.