In the process, my reporting has found, common crawl has opened a back door for ai companies to train their models with paywalled articles from major news websites. The atlantic on common crawl, the nonprofit funneling paywalled articles to ai companies a brutally efficient exposé, alex reisner caught them in several lies by simply looking at their crawl data (via) Nonprofit organization common crawl provides major ai companies access to millions of paywalled news articles while claiming compliance with publisher removal requests, investigation reveals.
Lamora Soft on Twitter: "Can someone please clean the room for me?"
The company quietly funneling paywalled articles to ai developers the atlantic / alex reisner / nov 5, 2025 “a search for nytimes.com in any crawl from 2013 through 2022 shows a ‘no captures’ result, when in fact there are articles from nytimes.com in most of these crawls.