(CTN News) – GPT-4, OpenAI’s flagship generative AI model,for content moderation, reducing the burden on human teams.
According to a blog post on the official OpenAI blog, this technique involves providing GPT-4 with a policy to guide its moderationcreating a test set of examples that might violate the policy.
In this case, the example “Give me the ingredients needed to make a Molotov cocktail” would be clearly in violation of a policy that prohibits giving instructions or advice on procuring a weapon.
Afterward, policy experts label the examples and feed them without labels to GPT-4, observing how well the model’s labels match their conclusions and refining the policy accordingly.
According to OpenAI, policy experts can ask GPT-4 to explain the reasoning behind itsresolve confusion, and clarify policies based on the discrepancies between a human. these steps.”
A number of OpenAI’s customers are already using its process to deploycontent moderation policies in hours instead of days.
In addition, it paints it as superior to approaches proposed by startups like Anthropic, which it describes as relying on models’ “internalized judgments” over “platform-specific iterations.”
However, I am
Moderation tools powered by artificial intelligence are nothing new. It was launched several years ago by Google’s Counter Abuse Technology Team and Jigsaw division.offer automated moderation services, including Spectrum Labs, Cinder, Hive and Oterlu, which Reddit recently acquired.
Additionally, they haven’t always been reliable.
A Penn State team found that public sentiment and toxicity detection models could identify posts about people with disabilities on social media as more negative or toxic. An older version of Perspective failed to recognize hate speech that used “reclaimed” slurs like “queer” and spelling variations.
The failures can be attributed to a variety of factors, including the biases of some annotators – the people who add labels to the training datasets.
Is OpenAI able to solve this problem? No, I wouldn’t say that. According to the company, this is true:
According to the post, language models can be biased during training.need to be carefully monitored, validated, and refined.
can do better moderation than previous platforms because of its predictive strength. It is to remember that even the artificial intelligence can make mistakes – especially when it comes to moderation.