Carley Cake

Carley Cake Leaks

In the process, my reporting has found, common crawl has opened a back door for ai companies to train their models with paywalled articles from major news websites. Following controversy surrounding perplexity for using publisher content in its search results, we undertook a brief analysis exploring how genai platforms handle paywalled articles.

Common crawl’s massive internet archive may be giving ai companies access to paywalled journalism, according to a new report. And the foundation appears to be lying to publishers about this—as well as masking the actual contents of its archives. The company quietly funneling paywalled articles to ai developers the atlantic / alex reisner / nov 5, 2025 “a search for nytimes.com in any crawl from 2013 through 2022 shows a ‘no captures’ result, when in fact there are articles from nytimes.com in most of these crawls.

Carley Cake

A nonprofit organization has been systematically supplying paywalled news articles to major ai companies for training large language models, according to an investigation published november 4, 2025, by the atlantic's alex reisner.

In the process, my reporting has found, common crawl has opened a back door for ai companies to train their models with paywalled articles from major news websites

Carley Cake
Carley Cake

Details

🦄 @carleycakeee - Carley - TikTok
🦄 @carleycakeee - Carley - TikTok

Details

Carley :) (@carleycake1) • Instagram photos and videos
Carley :) (@carleycake1) • Instagram photos and videos

Details