How I read a system
I have five days to map a production system and find what matters. The order I look in is not random. It is sorted by blast radius: what fails worst, examined first, while I still have the most context.
- 01
What is reachable from the outside
The first question is not how the code is written. It is what the internet can touch. Open security groups, public buckets, an admin panel on a guessable path, a database listening on a public address. The blast radius here is the whole system, so it goes first.
- 02
Who can do what
Identity grows by accident. A key minted for one script three years ago still has full access. A shared credential in a committed .env. Roles that were widened once for a deploy and never narrowed. I map who and what can act, because a quiet over-permission is a breach waiting for a reason.
- 03
Whether the data survives
Everyone has backups. Almost nobody has a restore they have actually run. A backup you have never restored is a hypothesis, not a safety net. I check the restore path before I trust anything else about the data layer.
- 04
How code reaches production
The deploy pipeline is where reliability is decided. Can you roll back in one step, or is rollback a manual scramble? Is there a path that skips review? Does the same artifact that was tested get shipped, or does it get rebuilt on the way out? The path tells me how the team behaves under pressure.
- 05
What is not tested
Last, the code itself, read through one lens: what here works only because nothing has stressed it yet. Untested code that happens to work is a demo that hasn't failed yet. I am not counting coverage. I am finding the places where the system has never been asked a hard question.
The output of five days is not a list of bugs. It is a map of where the system fails first, and what it would cost to be wrong about that.