In a detailed investigation for the atlantic, reporter alex reisner reveals that several major ai companies have quietly partnered with the common crawl foundation — a nonprofit that scrapes. The foundation's director argues for the right of ai to access all internet content. Multiple news publishers requested that common crawl remove their articles to prevent ai training use
Alysha Newman Onlyfans Leak » Leaker Dude
Common crawl claims it complies with these requests, but research shows otherwise.
The company quietly funneling paywalled articles to ai developers the atlantic / alex reisner / nov 5, 2025 “a search for nytimes.com in any crawl from 2013 through 2022 shows a ‘no captures’ result, when in fact there are articles from nytimes.com in most of these crawls.
The common crawl foundation has been scraping the internet for over a decade, creating a vast archive used by ai companies to train models, including paywalled content Despite claims of compliance with publishers' requests to remove their articles, investigations reveal that many remain in the archive