Become smarter in just 5 minutes
Morning Brew delivers fast, insightful updates about the business world, from Wall Street to Silicon Valley, every day.
In 1974, Shel Silverstein discovered the end of the trail. In 2024, technology companies have found the endpoint of the internet.
Big technology companies like OpenAI have been scavenging for data to train powerful AI systems, collecting nearly all available web content. And they warn that quality data (such as Wikipedia entries and scientific papers) could be completely depleted over the next two years, WSJ reported.
To avoid a major drag on their growth plans, tech companies are getting creative in finding new data sources for AI training, techniques that may fall into a legal gray area, says NYT. reported.
- OpenAI is said to have developed a tool to transcribe audio from YouTube videos, opening up a new source of data that could potentially infringe the copyrights of those videos.
- Curiously, YouTube's parent company Google didn't take issue with it. why?Because Google Also The NYT says it transcribes YouTube videos and feeds them into its own AI models, which it may want to keep on the DL.
The Internet is so heavily leveraged on high-quality data from these AI companies that Meta has considered acquiring the publisher Simon & Schuster to obtain the information contained in the books. It has been reported.
Things are getting weird there. With human-generated content limited, some companies are reportedly starting to develop “synthetic” information. Yes, that means using AI to create content that is used to train that same AI.NF