Common crawl’s massive internet archive may be giving ai companies access to paywalled journalism, according to a new report. In practice, the story implies that these archives scoop up considerable portions of articles normally behind a subscription paywall, in effect. Nonprofit organization common crawl provides major ai companies access to millions of paywalled news articles while claiming compliance with publisher removal requests, investigation reveals.
Olivia Swaida (Frizzylifts) (Frizzyimposter)
The company quietly funneling paywalled articles to ai developers the atlantic / alex reisner / nov 5, 2025 “a search for nytimes.com in any crawl from 2013 through 2022 shows a ‘no captures’ result, when in fact there are articles from nytimes.com in most of these crawls.