Medium has announced plans to block OpenAI’s GPTBot, a web content scraper, and is encouraging a coalition of platforms to unite against content exploitation by AI crawlers. Medium joins other media outlets in prohibiting GPTBot access through their robots.txt files. While Medium’s CEO, Tony Stubblebine, calls for responsible AI use, he acknowledges the voluntary approach may not suffice. Talks with major organizations are underway to form a coalition, but uncertainties around AI’s legal and ethical aspects pose challenges. This united front aims to counter the actions of unscrupulous AI platforms, but progress remains cautious and slow.
Medium has just unveiled its plan to block OpenAI’s GPTBot, an AI crawler that scrapes web content for training purposes. But the bigger story here is the potential emergence of a united front among various platforms against what many see as the exploitation of their content.
In this move, Medium follows in the footsteps of CNN, The New York Times, and several other media outlets (though TechCrunch has yet to join). They’ve collectively added “User-Agent: GPTBot” to their robots.txt files, which are documents that inform web crawlers and indexers whether a site consents to being scanned. It’s like saying, “I don’t want to be indexed on Google.”
However, AI developers do more than indexing; they scrape data to feed their models. This practice doesn’t sit well with many, including Medium’s CEO, Tony Stubblebine. He’s not against AI but believes that the current state of generative AI doesn’t benefit the internet as a whole. He states that AI companies profit from writers’ work without seeking consent or providing compensation and credit. To counter this, Medium is telling OpenAI to stay away when its scraper comes knocking (one of the few that will respect the request).
Stubblebine acknowledges that this voluntary approach may not deter spammers and others who will simply ignore the request. There’s also the option of taking active measures, like poisoning data by directing crawlers to fake content, but that could escalate into expensive legal battles.
However, there’s a glimmer of hope. Stubblebine reveals that Medium is actively working to form a coalition of platforms to address the issue of fair use in the AI age. He’s already in talks with several major organizations, although they’re not yet ready to publicly collaborate.
Many others are grappling with the same problem, and in tech, unity often leads to better outcomes. A coalition of major organizations could become a formidable force against unscrupulous AI platforms.
But what’s holding them back? Multi-industry partnerships tend to develop slowly due to legal and ethical uncertainties. AI is still new in terms of publishing and copyright, and there are numerous unanswered questions.
How can you agree on IP protection when the definition of IP and copyright is evolving? How can you ban AI use when your board is exploring ways to leverage it for the company’s benefit?
Perhaps it will take a major player like Wikipedia to take the first bold step and set a precedent. Some organizations may be constrained by business interests, but others can move forward without stockholder concerns. Until someone takes that step, we’ll remain at the mercy of crawlers, whether they respect our consent or not.