Pull Request

Published in Process on 2022-07-27

The Setup

There is a couple of reasons why I wanted to talk about this: One is my current project; the other one is because some tweets and a Linkedin post that I have seen recently from Dragan Stepanović talking about Pull Requests (PRs).

Some History

I think I am slightly on shaky ground here because I have really bad memory, so happy to be corrected. But when the Linux Kernel moved away from BitKeeper and Linus Torvalds created Git, it was created to support the way that development is done for the Linux Kernel. To submit code to be added to the Linux Kernel it goes through the mailing list.

That is why git-patch and git-send-email exist.

That flow is asynchronous in nature, and relays on having some on people that you don’t know or necessarily truss providing you code. Drive-by coding, as someone put it on twitter (I don’t have a link to that tweet)

When Github was created, they added the idea of Pull Requests quite quickly (again, I can’t find at the moment the original announcement, but I know it was in 2008, same year it was released).

While for the Linux Kernel the email flow is what they do and how they like to operate, for most other people doing any kind of OSS, is a bit too much involvement. Pull Requests were a different way to solve the issue of asynchronicity and low trust

The previous iteration

Before Pull Requests became a thing, companies would have Code Reviews, which will be someone (usually more senior) sitting with the writer of the code, to review what it was done before it was merged/deployed. Maybe some people did it already asynchronously, but I’ve seen it always done synchronously, that’s it, both people sitting together on the same machine going through the code.

Code Reviews are about solving the issue of low trust synchronously.

But that is difficult, means someone has to stop what they are doing to achieve the code review, or the person that wrote the code sit on their thumbs until someone could go to do the Code Review with them.

Because that was pain, or more professionally, a bottleneck, people decided to do something, and most companies adopted the …

Inefficient Approach

Everyone wanted their time, and not be interrupted while they were working on their code. Pull Request removes that inconvenience from them. Though is still left the writer of the code in a situation in which they had to wait for the code to be reviewed.

Code Reviews done synchronously and the asynchronous flow of Pull Request are inefficient forms of increasing the trust of code. More so the later.

The main issue is that, once a person that hasn’t been involved with the creation of the code comes into play, either they do a poor job of doing the review or start asking questions about why decisions were taken. They have to leave their current space and grasp the context of the code being reviewed.

If we try to keep the code small, that means multiple interruptions to the reviewer interrupting their work constantly, and lack of ability to handle bigger changes. Or we can try to reduce the number of interruptions, but then the code that need to be reviewed becomes bigger and the process of reviewing takes longer.

Let’s imagine that all the code is fine and done correctly. The reviewer had to

- stop working on what they were doing, 
- understand the context of the changes
- talk with the reviewee about why decisions were taken,
- and then give the ok.

And now let’s imagine that the code is wrong from a design level perspective. The reviewer had to

- stop working on what they were doing, 
- understand the context of the changes
- talk with the reviewee about why decisions were taken
- indicate what needs to be changed.
- and then, after the reviewee do the changes (itself a time sink), do the whole process again.

The feedback loop is slow. What I have seen in multiple companies is that the PR process tend to extend over a day due to the back and forth. Rework being needed is not unusual.

The direct consequence of this is what someone put on a twitter:

10 lines of code, 10 suggestions to improve it.
500 lines of code, LGTM

(The earliest version of this that I can find in twitter is this one by @red_nax_elA)

Increasing Trust

Code reviews and Pull Requests are there to increase the trust we have in the system. But both of them are inefficient (PRs are even more, as they are less likely to achieve their objective).

But we have other tools, more efficient, more effective, to increase trust on the system.

I have talked a bit on the past about tests. Test help in increasing the trust on the system. But test only help with the behaviour of the system and not with some of the non-functional characteristics of your code.

Which is where pair programming and ensemble programming come into place.

With those two approaches, nobody stops working on their other task, because there is no other task, is the same. Understanding is developed as the code is being written; alternative approaches/structures are researched and discussed before you have gone too far into a specific route; they teach each other about usage within the code, functions/methods/objects already available, or even tricks and tips of the language. The additional pair (or pairs) of eye is there from the beginning. More importantly, you accelerate the feedback loop over the PR version (a quicker feedback loop is while agile came into place, why we have CI and CD, and feature flags, …).

Now, I understand the reticence of people to do pair/ensemble programming. I know a good chunk (including myself) that started developing for the sheer joy of it, the ability to get into the flow and forget about the world around you. Hobby or OSS coding should scratch that itch. When you are working within a company, you need to change your ways to develop in the most effective manner. Which is not easy. It takes time to develop the skills and the stamina to do it.

The Conclusion

I firmly believe that outside of OSS, Pull Requests should not be used. A company has better ways to increase trust with synchronicity.

For a company, pair and ensemble programming are better options.

Now with remote work so prevalent, maybe your colleagues are located in completely different time zones. Personally, I have found that far less efficient than when people work within similar timezone (everyone within a 2 hour difference is where I put the upper limit of variance).

Finally I have seen that Dragan has a few more posts around PRs vs Pair/Ensemble than the one I saw, so I will be reading them.