Do you understand your Code?

A race condition can hit you anytime, you think it won't because your system isn't that big, or doesn't process a lot of requests, but it will happen, and when it happens, you will feel a bit silly about not handling it.

They don't happen during local testing. They don't happen when you manually click buttons. They happen on a Friday evening when two requests arrive within the same millisecond. Suddenly you have duplicate invoices, duplicate payments or reviews. It makes life difficult, because now you have a production system which is in an inconsistent state.

When you build systems that create things/records, it is better to add a check that only one record is getting created, the best way to achieve that is to add a unique constraint on your db table. If your business rule says only one record should ever exist, let the database enforce it. A unique constraint is often simpler and more reliable than hoping your application code wins every race.

Every feature has a few things that should never be violated. A user should only be charged once. An order should only be shipped once. A username should be unique. Figure out those invariants first, then let your database and your code enforce them. Once you know what must always be true, designing the rest of the system becomes much easier.

Of course, not every problem can be solved with a constraint. Sometimes you need multiple operations to behave as one, and that's where concurrency control comes into the picture.

You can lock the rows for updation but at some point you will encounter deadlock situations, so what to do? Well, there are things like Optimistic locking and Pessimistic locking. I mean we give common sense solutions names but locking rows is common sense. Sometimes you lock rows before modifying them (pessimistic locking). Sometimes you assume conflicts are rare and detect them only when saving (optimistic locking). Neither is magic, they solve different problems. You have to make decisions based on the business rule.

Even if you've handled concurrent writes correctly, your system still has another enemy: partial failures.

Retrying, you can always retry right? Retries are so fun, you don't have to worry about it. Wrong! Retries can introduce more ridiculous conditions that you might never have imagined could happen. So what do you do? Do you build a self healing system? Imagine your image gets published successfully, but the third-party API times out before confirming it. Your retry logic kicks in, and congratulations, you've just published it twice. Now you have to deal with a confused client.

The frustrating part about these failures is that they often happen when you're asleep. By the time you wake up, the only evidence left is whatever your system decided to record.

Always log things, people underestimate logging, obviously you shouldn't over-log, but log things. Your code might handle errors gracefully, but you need to know what went wrong and why so the next day when you wake up, you can patch things up. Every important decision your code makes should leave behind a breadcrumb. Logs tell you why something failed. Monitoring tells you that it failed in the first place. Both are equally important. So add some monitoring. You should know what is happening in the system. A bird's eye view.

Given how AI is now writing a lot of code and in the blink of an eye, it becomes more difficult to understand it. You need to build testing strategies that your AI follows every time it builds a feature. For that you have to understand the feature perfectly and think about what edge cases might hit your system. AI can do that, but it never has the full context, and some times it just choses to ignore the context. So better to be the context yourself. Drive the AI.

Previously, in the cave man times when the human wrote code themselves, they knew what they were writing they understood it. When an error appeared they knew why it appeared and what might be broken. In the times of AI writing code, the human might not know why the error appeared. Good thing here is that AI is good at pattern matching and in 9/10 events it will tell you where the error is and what to do to mitigate it. But to mitigate the root cause you will have to think harder. Monkey patching is easy, but there is always a reason why the error occurred in the first place. For a long term fix, you have to patch that too.

Code understandability is important, you have to understand the code, if you do you become the foremost authority on that code. You know how things work, you know when something breaks what might be the reason. You just know.

But now that AI writes bulk of the code, one cannot be expected to keep everything in mind, you have to adjust for edge cases, you have to make sure the code handles edge cases gracefully otherwise you will find yourself in a dark hole of unknowns and when your lead asks you why this happened you will never have an answer. You have to change your testing approach, you have to add checks for edge cases, send the same request twice at the same time, see how the system behaves. If it doesn't behave the way you expected it to, you fix it. What if the application loses db connection midway? What happens then? Is your db idempotent? You have to have AI write these tests and also review them. AI can try to be a bit cunning and skip the hard things at times.

The best test of your code is in production, where real things happen, if your code can stand tall on prod, then you have won. If writing code has become cheaper, then verifying that code has become much more important.

Code understandability builds overtime, if you have written the code by your own hands then you do remember, but if it is a repo that you inherited, then it takes time to understand things. With AI you can ask questions and all that, but true understanding comes from debugging, tracing and going through code.

One lesson I've learned from all of this is that complexity compounds.

With AI writing a bunch of code, you need to make sure things are simple, you need to make sure you have minimised the number of parts that the code has. If there are more cogs in the machine than there should be, you have to remove some cogs and make the machine simple. Simple is better. Simple is smooth. Simplicity will never betray you or keep you awake at night. Do not overcomplicate solutions to get a promotion, do not write cunning code, be simple. Simple code is easy to debug and maintain.

AI has made software engineering faster, but it hasn't made distributed systems any less complicated. Networks still fail. Databases still race. Users still click twice. Production still finds the assumptions you forgot to test. The engineers who thrive won't be the ones who generate the most code, they'll be the ones who understand the systems they build. AI can write code, but it can't own production. You do.

Do you understand your Code?

Comments (2)

More from this blog

Don't you worry about Integrations

What I did in April

What did I do in March?

What did I do in February

Command Palette

Comments (2)

More from this blog