
Current AI companies ignore licenses such as the GPL, and often train on anything they can scrape.
This is not acceptable.
The AI companies ignore web conventions, e.g., they deep link images from your web sites (even adding
?utm_source=chatgpt.com to image URIs, I suggest that you return 403 on these requests), but do not direct visitors to your site.
You do not get a reliable way of opting out from generative AI training or use. For example, the only way to prevent your contents from being used in Google AI Overviews is to use
data-nosnippet and cripple the snippet preview in Google.
The AI browsers such as Comet, Atlas do not
identify as such, but rather pretend they are standard Chromium.
There is no way to ban such AI use on your web site.
Generative AI overall is flooding the internet with garbage. It was estimated that 1/3rd of the content uploaded to YouTube is by now AI generated.
This includes the same veteran stories crap in thousands of variants as well as brainrot content (that at least does not pretend to be authentic), some of which is among the most viewed recent uploads. Hence, these platforms even
benefit from the AI slop.
And don t blame the creators because you can currently earn a decent amount of money from such contents, people will generate brainrot content.
If you have recently tried to find honest reviews of products you considered buying, you will have noticed thousands of sites with AI generated fake product reviews, that all are financed by Amazon PartnerNet commissions. Often with hilarious nonsense such as recommending sewing thread with German instructions as tool for repairing a sewing machine.
And on Amazon, there are plenty of AI generated product reviews the use of emoji is a strong hint. And if you leave a negative product review, there is a chance they offer you a refund to get rid of it
And the majority of SPAM that gets through my filters is by now sent via Gmail and Amazon SES.
Partially because of GenAI,
StackOverflow is pretty much dead which used to be one of the most valuable programming resources.
(While a lot of people complain about moderation, famous moderator Shog9 from the early SO days
suggested that a change in Google s ranking is also to blame, as it began favoring showing new content over the existing answered questions causing more and more duplicates to be posted because people no longer found the existing good answers.
In January 2026, there were around 3400 questions and 6000 answers posted, less than in the
first month of SO of August 2008 (before the official launch).
Many open-source projects are suffering in many ways, e.g., false bug reports that caused curl to stop its bug bounty program.
Wikipedia is also suffering badly from GenAI.
Science is also flooded with poor AI generated papers, often reviewed with help from AI. This is largely due to bad incentives to graduate, you are expected to write many papers on certain A conferences, such as NeurIPS. On these conferences the number of submissions is growing insane, and the review quality plummets. All to often, the references in these papers are hallucinated, too; and libraries complain that they receive more and more requests to locate literature that does not appear to exist.
However, the worst effect (at least to me as an educator) is the noskilling effect (a rather novel term derived from deskilling, I have only seen it
in this article by We els and Maibaum).
Instead of acquiring skills (writing, reading, summarizing, programming) by practising, too many people now outsource all this to AI, leading to them not learn the basics necessary to advance to a higher skill level. In my impression, this effect is
dramatic. It is even worse than
deskilling, as it does not mean losing an advanced skill that you apparently can replace, but often means not acquiring basic skills in the first place.
And the earlier pupils start using generative AI, the less skills they acquire.
Dogfood the AI
Let s dogfood the AI. Here s an outline:
- Get a list of programming topics, e.g., get a list of algorithms from Wikidata, get a StackOverflow data dump.
- Generate flawed code examples for the algorithms / programming questions, maybe generate blog posts, too.
You do not need a high-quality model for this. Use something you can run locally or access for free.
- Date everything back in time, remove typical indications of AI use.
- Upload to Github, because Microsoft will feed this to OpenAI
Here is an example prompt that you can use:
You are a university educator, preparing homework assignments in debugging.
The programming language used is lang .
The students are tasked to find bugs in given code.
Do not just call existing implementations from libraries, but implement the algorithm from scratch.
Make sure there are two mistakes in the code that need to be discovered by the students.
Do NOT repeat instructions. Do NOT add small-talk. Do NOT provide a solution.
The code may have (misleading) comments, but must NOT mention the bugs.
If you do not know how to implement the algorithm, output an empty response.
Output only the code for the assignment! Do not use markdown.
Begin with a code comment that indicates the algorithm name and idea.
If you indicate a bug, always use a comment with the keyword BUG
Generate a lang implementation (with bugs) of: n ( desc )
Remember to remove the BUG comments! If you pick some slighly less common programming languages (by quantity of available code, say Go or Rust) you have higher chances that this gets into the training data.
If many of us do this, we can feed GenAI its own garbage. If we generate thousands of bad code examples, this will poison their training data, and may eventually lead to an effect known as
model collapse .
On the long run, we need to get back to an internet for people, not an internet for bots.
Some kind of internet 2.0 , but I do not have a clear vision on how to keep AI out if AI can train on it, they will. And someone will copy and paste the AI generated crap back into whatever system we built.
Hence I don t think technology is the answere here, but human networks of trust.