SUMMARY
This is AI generated summarization, which may have errors. For context, always refer to the full article.
MANILA, Philippines – OpenAI is letting website owners opt out of being scraped by its web crawler GPTBot.
Web crawlers are most commonly used by search engines to browse through the internet, and index websites. OpenAI’s GPTBot web crawler will be used by the ChatGPT creator to improve its AI systems by scouring the internet for websites that hold publicly available, potentially valuable training data.
On its blog, OpenAI said, “Web pages crawled with the GPTBot user agent may potentially be used to improve future models…. Allowing GPTBot to access your site can help AI models become more accurate and improve their general capabilities and safety.”
The company also said some websites are automatically excluded from being scraped, such as those “that require paywall access, are known to gather personally identifiable information (PII), or have text that violates our policies.”
While GPTBot would prosper the most if it could access all websites freely, OpenAI has also released instructions on how to disable or block the bot. It simply involves adding a few lines to a website’s robots.txt file, a document that tells crawlers which parts of a website they can access, or by blocking its IP address.
The specific instructions can be found in OpenAI’s blog post.
OpenAI’s bot opt-out options come amid an ongoing debate on AI makers’ use of internet data to train their systems, specifically generative AI such as ChatGPT and Google Bard.
As these tools rose in popularity, platforms like Reddit, Twitter, now known as X, and developer-centric forum Stack Overflow announced plans to charge for access to their data, which AI companies need for their systems training.
Various lawsuits have also been filed against AI companies for scraping data, allegedly without the consent of the rights holder of a specific piece of creative work under copyright infringement laws. – Rappler.com
Add a comment
How does this make you feel?
There are no comments yet. Add your comment to start the conversation.