Elon Musk warns of ‘peak data’ in AI training, highlights risks of synthetic data
Elon Musk, the world’s richest businessperson and owner of major tech companies such as SpaceX, Tesla, and X, stated that “peak data” would soon be reached, as there is little real-world data left to train AI models.
He suggested shifting to processing “synthetic data” but warned that AI technologies are increasingly prone to generating “hallucinations.”
During a recent livestream Thursday, Musk explained that nearly all of humanity’s available knowledge had been processed in AI training.
“We’ve exhausted basically the cumulative sum of human knowledge in AI training, That happened basically last year,” said Musk during the livestream on X.
Musk, who launched his own AI business, xAI, in 2023, suggested technology companies would have no choice but to turn to “synthetic” data—that is, generated by AI that leads to self-learning.
“The only way to then supplement that is with synthetic data where it will sort of write an essay or come up with a thesis and then will grade itself and go through this process of self-learning,” he added.
Musk cautions about ‘AI hallucination’
Musk cautioned, however, that AI models’ tendency to produce “hallucinations,” inaccurate or nonsensical outputs, poses a risk to the synthetic data process. He said hallucinations make using artificial material “challenging” because “how do you know if it hallucinated the answer or it’s a real answer.”
Andrew Duncan, director of foundational AI at the U.K.’s Alan Turing Institute, noted that Musk’s statement aligns with a recent academic paper suggesting that publicly available data for AI models could be depleted by 2026, as reported by the Guardian. He warned that overreliance on synthetic data could lead to “model collapse,” where model outputs degrade in quality.
“When you start to feed a model synthetic stuff you start to get diminishing returns,” he said, highlighting the risk of biased and uncreative outputs. Duncan also pointed out that the rise of AI-generated content online could result in the material being incorporated into AI training data.