← Blog

Programming as Theory Building

Very recently I came across a podcast discussing a paper from 1985 titled “Programming as theory building” by Peter Naur, yes the ‘N’ from the BNF Form. I found that most of its contents are still relevant today.

Summary

Here’s a summary of its contents:

The paper discusses the meta-physics of programming. What it is? What it isn’t? The various states a computer program goes through. And the various agents acting(programmers, testers, modifiers) acting on it.

It primarily states that the task of programming is not about “text manipulation”, or even to get the program to behave like it should be. The task of programming is about building a theory about the problem the program is trying to solve.

According to Ryle, quoting from this paper:

Very briefly, a person who has or possesses a theory knows how to do certain things and in addition can support the actual doing with explanations, justifications, and answers to queries, about the activity of concern. So ’theory’ is not knowledge.

Over a weekend, I may understand how CRDTs work and code a sample implementation. But, would I have a worse, equal, or same level of comfort with this topic compared to someone who has spent 2 days working on CRDTs at Figma? I would have worse. Maybe this is a bad example, let’s move on to the next one.

I know how to ride a bike, I have a theory of it. I can produce a text that explains how to do it. But that alone may not be sufficient to ’enable’ someone else to ride a bike.

Naur argues that to build theory of a program, a programmer must be able to do the following

  1. The programmer having the theory of the program can explain how the solution relates to the affairs of the world that it helps to handle.
  2. The programmer having the theory of the program can explain why each part of the program is what it is, and also importantly why the program rejects doing something in a particular way.
  3. The programmer having the theory of the program is able to respond constructively to any demand for a modification of the program so as to support the affairs of the world in a new manner.

In the theory building view of programming, if a team of 3 people build a software, its theory lives with those three people. A critical software needs to have people with its theory to support theory modification. If the three people are unavailable, the only alternative is to have new people rebuild the theory through

  1. rewriting the software
  2. pair-programming sessions with original author(s) / similar interactions over a long period of time.

Implications in a workplace

  1. It makes sense how veterans of a codebase fix a bug really really fast, while newbies take longer even when at times the newbies are experienced programmers.
  2. It shows how important ‘Onboarding’ really is. I’ve personally worked in teams where people setup 5-10 calls to onboard different parts of the codebase, and I’ve seen some teams where this opportunity isn’t given, and the new member is dropped in deep waters after a superficial onboarding session. The difference is stark when you compare the ‘vision’ of the PRs that come from the ’less’ onboarded people, even when they are 2x smarter than the existing team members. “Theory” of a program is important to transfer.
  3. It also emphasizes the need to share your work with your other teammates through requesting MR reviews, water-cooler conversations, review meetings, etc. in order to transfer the theory that you’ve modified/created with others.
  4. “Programming is NOT code or documentation” isn’t an excuse to stop writing meaningful comments, docs, readable tests, etc. All of these tools will only help you preserve the theory of a program better.

Vibe coding and its implications

On the internet, we’re seeing a lot of examples using Vibe Coding to get to a prototype very quick.

Even I am using Claude to drive some frontend/app development work for a side project.

Once I understood the theory building view of programming, it made sense why fixing a very simple piece of code there takes me longer than usual when the base work has been built by someone else (Claude in this case).

I don’t think we’re at a point in time where organizations would start writing their internal tools, or business critical software in AI. But let me protest against it still.

We’re accepting that it’s essential (even economically) to have theory of majority of software that we build and maintain. If AI starts writing half of that, we should start factoring in the extra time it takes to build a theory of the program every time we do a non AI-assisted change to that codebase.

At some point in the future, an LLM maybe very self-sufficient, and always, correctly answers to program modification requests. At that point in time, we might be okay with it handling the theory of a software and begin to trust its work.

Until then, every software organization needs to find their balance in terms of which parts of their software should LLMs autonomously touch.

Does theory-building view give us any action points?

From my understanding, a theory-building view of software development answers the following questions very clearly.

  1. When to rewrite a software project?

    A: When its current maintainers fail to intellegibly answer questions related to changes to its theory, and when the people who wrote the software project don’t have the time to pass it on.

  2. Are there long-term downsides to using autonomous AI / contractors to do short-term cost savings?

    A: Yes, we lose the theory of the final software and if the theory needs continuous modification, it’ll be even costlier in the long-run than the savings we made.

  3. Should developer productivity be measured in SLOCs produced or number of PRs?

    A: No, the theory-building view of software development suggests that a developer would be more productive over time in a single codebase. Additionally, we have to figure out ways to track how much theory a developer builds about the projects they are working on as a measure of their productivity.

Many other inferences can be made from this framework and applied to our practices. But the answers aren’t 0s and 1s, there are things we need to figure out for ourselves related to

  1. How to communicate and preserve theory of a software?
  2. How to build (theory of) software that we don’t actively work on?
  3. Which of our codebase(s) can be given to third party for maintenance?
  4. What are some metrics that track the quality of theory in a programmer’s mind?

As it gets considerably cheaper to delegate software development tasks to an external entity, it might just be better/cheaper to keep working on some parts of a codebase between people who keep close contact.