Major news outlets including CNN, NBC, and USA Today are advocating for a ban on AI training using their content from Common Crawl, highlighting concerns over copyright infringement and potential revenue loss. This move comes amid ongoing discussions within the industry regarding the practice of AI companies accessing publisher content without consent, prompting the News/Media Alliance to call for stricter enforcement of opt-out requests from publishers to protect their materials.

CNN: CNN is a leading American news media company and the flagship property of CNN Worldwide, a division of Warner Bros. Discovery, offering 24/7 cable news, digital platforms, and international coverage. It has recently achieved strong multiplatform performance early in 2026 amid ongoing industry shifts. In this news, CNN is one of the major outlets demanding that Common Crawl honor opt-out requests to prevent its content from being used in AI training due to copyright and revenue protections.
NBC: NBC News is the news division of the NBC broadcast network, operating under NBCUniversal News Group, providing breaking news, analysis, and content across television, digital, and streaming services. Recent leadership changes have emphasized focus on content innovation, culture, and fact-based reporting. NBC has joined other publishers in pushing back against Common Crawl archiving its content for unauthorized AI model training.
USA Today: USA Today is a prominent American daily newspaper and digital news provider under USA Today Co., delivering accessible reporting on national news, sports, entertainment, and lifestyle topics. It recently named a new top editor and pursued acquisitions of regional papers to address declining local journalism. USA Today is actively involved in efforts to block its content from Common Crawl to protect copyrights and preserve licensing opportunities.

AI Data Sourcing: Common Crawl operates as a key web archive that AI developers have used to access publisher content for training large language models.
Copyright Pushback: Major news organizations cite ongoing copyright infringement concerns from AI firms accessing archived content without permission.
Publisher Advocacy: The News/Media Alliance has formally urged Common Crawl to cease unauthorized scraping of news content and to enforce opt-out requests from publishers.