My views about Github Copilot
The setup
Recently Github announced Copilot, a GPT-3 powered code assistant. And there has been a lot of discussion about it, a lot.
The idea
The idea, in principle, is quite cool: you remove the need to create as much code as possible, leaving it to the assistant to create the code for you. I am fine with that. I don’t want to do the boring parts. I want to spend my time solving the important issues.
At the moment, Copilot is not perfect, for example check this, but I have no doubt it will improve.
But, I want you to read this thread, from a person that has worked in the same space before, and that I respect as a dev.
And probably this is what gets me: This is about creating a tool to automate the creation of code, when what we want is to reduce the amount of code that we want to create. There will be always be a need for a set of people to know how to do low level code. But for most of the work that I have seen, tooling/languages at a higher level of abstraction are better. That is why I don’t want to do for loops anymore, when a nice map or reduce would do the work. Use (create) languages that allow you to do more with less, that allow you to avoid mistakes (the less code you have the lesser are the chances of a mistake).
IANAL
There are few things that are being discussed around what is happening with the code creation. For example this tweet or this other one, and in the opposite direction this post. Is it legal? Do I need to worry about it? This is complicated. I am not a lawyer, and therefore, I don’t have the answer here. Which is an issue at the moment.
But this, this is more complicated. Completely ignoring the licenses and using all the public code is a bold move. I have no doubt that they have consulted with their lawyers and either they have decided they can win in a court of law, or they can just outspend whoever comes after them.
Public repositories doesn’t mean that the license is permissive in any way. Something being public doesn’t mean that you can use it. I have no doubt that if it was the code of Microsoft harvested by another company they will actually pursue legal actions. So why are they doing that? It doesn’t seem right to me.
My understanding is that they will charge for Copilot, which is not open sourced. So they are going to earn economical benefit, from “free” code. Furthermore, the advanced telemetry terms seems to allow them to use whatever code you write while using the plugin (When you edit files with the GitHub Copilot plugin enabled, file content snippets and suggestion results will be shared with GitHub and OpenAI and used for diagnostic purposes and to improve suggestions
).
My views
For a business: I would check with a lawyer before allowing to use Copilot. Because maybe Microsoft would be protected, but if a big company goes after you … how much can you spend defending yourself in court?. Furthermore, if you want to keep your code private, because it has some trade secret you want to keep (an algorithm or whatever else), due to the advanced telemetry terms
, it seems to me that you don’t want to use Copilot at all.
For any kind of open source: at the moment they are scouring only github public repos, but I expect they will end ingesting any public repo anywhere, maybe even change the license agreement on Github itself to allow to scrap private repositories. So I do expect a GPL4, Apache 3 or some new license to come that prohibits this kind of code laundering
. Private repos are the other option, but … that kind of breaks the whole open source ethos. I think those are the two defenses available if you want to stop what they are doing.