OpenAI recently soft launched its new web crawler ‘GPTBot’ to enhance and improve AI accuracy, capabilities, and safety for users and admins.
GPTBot is designed to gather data to improve the performance of the company’s AI models (such as GPT-4) by enabling the crawler to access the websites’ data that will contribute to its own data pool.
“Allowing GPTBot to access your site can help AI models become more accurate and improve their general capabilities and safety,” OpenAI said in a release.
(Web crawlers are computer programs that collect data to index the content of websites across the internet to improve search engine results).
The crawler is also capable of restricting and filtering out sources that violate data privacy and policies implemented by OpenAI. This includes, text that violates their policies, paywall-restricted sources, and sources that gather personally identifiable information.
Consequently, the company added a feature to safeguard users by providing privacy controls to web administrators to enable or disable the crawler’s access to their sites.
More so, OpenAI gave out instructions to customize, enable or disable GPTBot’s access by modifying the robots.txt file.
In addition, they also released the IP egress ranges, to provide transparency as to where the traffic source comes from on the web admins’ sites.
It is definitely nice to have improved security options that enable web owners to control what programs could access their data. Having the privacy controls to enable or disable the crawler only proves that AI systems exert effort to provide limitations.