- OpenAI, Meta, Google, and other big tech companies use online data to train their AI models.
- However, AI models learn so quickly that all data could be exhausted by 2026.
- So how do AI systems continue to learn? Big Tech has some interesting ideas.
When it comes to AI, the more the better. The more data an AI system is trained on, the more powerful it becomes.
But as the AI arms race intensifies, big tech companies like Meta, Google, and OpenAI are facing a lack of data to train their models.
Many major AI systems are trained on a vast supply of online data. But all high-quality data could be used up by 2026, according to AI research institute Epoch.
As a result, leading technology companies are looking for new data sources to continue learning their systems. Here are some of the most creative options tech companies are considering.
Google looked at leveraging the consumer data available in Google Docs, Sheets, and Slides.
Last summer, Google's legal department began asking employees to broaden their language about the use of consumer data, the Times reported. Some employees were reportedly informed that the company wanted to use data from his restaurant reviews on free consumer versions of his Google Docs, Google Sheets, Google Slides, and even his Google Maps. is.
Google updated its privacy policy in July 2023, but the company said it did not expand the types of data it uses to train its AI models.
Splurge on publisher Simon & Schuster.
At Meta, executives were concerned about the dwindling supply of available data and met nearly every day in March and April of last year to brainstorm alternatives, the Times reported.
One of the ideas that emerged from these meetings was to acquire Simon & Schuster. The famous publisher, which has worked with authors such as Stephen King and Jennifer Weiner, was acquired by private equity firm KKR last year for $1.62 billion.
Other attendees suggested the more budget-friendly option of paying $10 per book for full licensing rights to new titles.
Generating synthetic data
Synthetic data is data generated by an AI system, and OpenAI considers it an option for models.
According to the paper, OpenAI CEO Sam Altman said at a technology conference last May that “as long as the model can survive the event horizon of synthetic data smart enough to create good synthetic data, Everything will be fine.”
The problem with training AI systems on synthetic data is that it can reinforce some of the AI's mistakes and limitations, the Times reported. OpenAI is working on a process to address this, where one AI system generates data and another AI system makes decisions about it.
On February 28, Axel Springer, the parent company of Business Insider, joined 31 other media groups in filing a $2.3 billion lawsuit against Google in Dutch court, alleging losses caused by the company's advertising practices. I woke you up.
Axel Springer, Business Insider's parent company, has a global deal that allows OpenAI to train models based on its media brands' reporting.