One of my Twitter mutuals recently shared the following tweet with me regarding AI:
The next battleground of companies making money from your data is companies like StackOverflow and Reddit repackaging people’s answers as training data for UI.— Dare Obasanjo (@Carnage4Life) April 19, 2023
Funny to think that you’ll one day be replaced by an AI trained on content you created for freehttps://t.co/W1DGN0ubH6
I found Dare Obasanjo’s commentary especially interesting because my connection to Stack Overflow runs a bit deeper than it might for some developers. As I mentioned in a much older post, I was a beta tester for the original stackoverflow.com. Every beta tester contributed some of the original questions still on the site today. While the careers site StackOverflow went on to create was sunsetted as a feature last year, it helped me find a role in healthcare IT where I spent a few years of my career before returning to the management ranks. Why is this relevant to AI? Because the purpose of Stack Overflow was (and is) to provide a place for software engineers to ask questions of other software developers and get answers to help them solve programming problems. Obasanjo’s takeaway from the CEO’s letter is that this decade-plus old collection of questions and answers about software development challenges will be used as input for an AI that can replace software engineers altogether. My main takeaway from the same letter is that at some point this summer (possibly later) Stack Overflow and Stack Overflow for Teams (their corporate product) will get some sort of conversational AI capability added, perhaps even without the “hallucination problems” that have made the news recently.
Part of the reason I’m more inclined to believe that [chatbot] + [10+ years of programming Q & A site data] = [better programming Q & A resource] or [better starter app scaffolder] instead of [replacement for junior engineers] is knowing just how long we’ve been trying to replace people with expertise in software development with tools that will enable people without expertise to create software. While enough engineers have copied and pasted code from Stack Overflow into their own projects that it led to an April Fool’s gag product (which later became a real product), I believe we’re probably still quite some distance away from text prompts generating working Java APIs. I’ve lost track of how many companies have come and gone who put products into the market promising to let businesses replace software developers with tools that let you draw what you want and generate working software, or drag and drop boxes and arrows you can connect together that will yield working software, or some other variation on this theme of [idea] + [magic tool] = [working software product] with no testing, validation, or software developers in between. The truth is that there’s much more mileage to be gained from tools that help software developers do their jobs better and more quickly.
ReSharper is a tool I used for many years when I was writing production C# code that went a long way toward reducing (if not eliminating) a lot of the drudgery of software development. Boilerplate code, variable renaming, class renaming are just a few of the boring (and time-consuming) things it accelerated immensely. And that’s before you get to the numerous quick fixes it suggested to improve your code, and static code analysis to find and warn you of potential problems. I haven’t used GitHub Copilot (Microsoft’s so-called “AI pair programmer) myself (in part because I’m management and don’t write production code anymore, in part because there are probably unpleasant legal ramifications to giving such a tool access to code owned by an employer), but it sounds very much like ReSharper on steroids.
Anthony B (on Twitter and Substack) has a far more profane, hilarious (and accurate) take on what ChatGPT, Bard, and other systems some (very) generously call conversational AI actually are:
His Substack piece goes into more detail, and as amusing as the term “spicy autocomplete” is, his comparison of how large language model systems handle uncertainty to how spam detection systems handle uncertainty provides real insight into the limitations of these systems in their current state. Another aspect of the challenge he touches on briefly in the piece is training data. In the case of Stack Overflow in particular, having asked and answered dozens of questions that will presumably be part of the training data set for their chatbot, the quality of both questions and answers varies widely. The upvotes and downvotes for each are decent quality clues but are not necessarily authoritative. A Stack Overflow chatbot could conceivably respond with an answer based on something with a lot of upvotes that might actually not be correct.
There’s an entirely different discussion to be had (and litigation in progress against an AI image generation startup, and a lawsuit against Microsoft, GitHub, and OpenAI) regarding the training of large language models on copyrighted material without paying copyright holders. How the lawsuits turn out (via judgments or settlements) should answer at least some questions about what would-be chatbot creators can use for training data (and how lucrative it might be for copyright holders to make some of their material available for this purpose). But in the meantime, I do not expect my job to be replaced by AI anytime soon.