According to a report by the tech site The Information, sources stated that OpenAI obtains this data through SerpApi, a company specialized in web-scraping technologies, which allows AI models to access search results and build large datasets used for training.
The report pointed out that OpenAI’s tools often struggle to provide accurate answers when dealing with real-time topics, which explains their reliance on Google data to enhance accuracy.
In an experiment conducted by former Google engineer Abhishek Iyer, it was shown that ChatGPT uses Google’s index: he created a fake word, placed it on a hidden page, and added it to the search index. Later, this word appeared in ChatGPT’s responses with the exact same wording.
OpenAI had previously confirmed that its search capabilities rely on an internal web crawler, in addition to data from publishers with whom it has licensing agreements—emphasizing that Google is not one of those companies.
These developments come at a time when Google is considered a direct competitor to ChatGPT, as users’ increasing reliance on AI-generated answers has led to a decline in the use of traditional search engines and reduced traffic to websites appearing in search results.