Meta’s AI chatbot guidelines leak raises questions about child safety

3 hours ago 1

(Image credit: Meta)

A leaked Meta document revealed that the company’s AI chatbot guidelines once permitted inappropriate responses
Meta confirmed the document’s authenticity and has since removed some of the most troubling sections
Among calls for investigations is the question of how successful AI moderation can be

Meta’s internal standards for its AI chatbots were meant to stay internal, and after they somehow made their way to Reuters, it's easy to understand why the tech giant wouldn't want the world to see them. Meta grappled with the complexities of AI ethics, children's online safety, and content standards, and found what few would argue is a successful roadmap for AI chatbot rules.

Easily the most disturbing notes among the details shared by Reuters are around how the chatbot talks to children. As reported by Reuters, the document states that it's "acceptable [for the AI] to engage a child in conversations that are romantic or sensual" and to "describe a child in terms that evidence their attractiveness (ex: “your youthful form is a work of art”)." Though it does forbid explicit sexual discussion, that's still a shockingly intimate and romantic level of conversation with children for Meta AI to allegedly consider.

And it's not the only example likely to disturb people. Meta AI's rules, the report notes, allow the chatbot to compose explicitly racist content if the prompt is phrased correctly, and to provide wrong or even harmful health information as long as some kind of disclaimer is included.

In one of the more surreal examples, the guidelines instructed AI to reject inappropriate image generation requests in most cases, but in some instances to instead apparently deflect with a 'funny' substitution. As an example, the document reportedly mentions that a prompt to generate an image of “Taylor Swift topless, covering her breasts with her hands” could be answered by generating an image of Swift “holding an enormous fish.” The document reportedly included both the unacceptable and the “acceptable” version side by side, essentially training the bot to outwit inappropriate prompts with visual sleight of hand. Meta declined to comment on the example.

Meta has confirmed the authenticity of the document and said it’s now revising the problematic portions. Meta removed the children's interaction section after Reuters reached out, and called those rules “erroneous and inconsistent” with company policy. As of now, Reuters said the document still says racial slurs are allowed if disguised in hypotheticals, as is disinformation framed as fiction.

No time for safety and ethics

It’s a troubling revelation that has already prompted public outrage, lawmaker scrutiny, and urgent promises from Meta. But it shows that as AI spreads, the need to move fast with the technology leaves any plans for rules and regulations scrambling to catch up, whether written internally or by lawmakers and regulators.

For most people, the story raises basic questions of AI safety. While it might be ideal to not have minors interacting with general AI chatbots unsupervised, that's very unlikely, judging by the number of children and teens who admit to using tools like ChatGPT for schoolwork. Avoiding Meta AI is particularly challenging because the company has embedded the chatbot across Facebook, WhatsApp, Messenger, and Instagram. Users can interact with AI characters that are often presented in playful, friendly ways, and Meta has marketed these tools as fun and even educational. But the leaked guidelines suggest the backend isn’t always aligned with that wholesome image.

Members of Congress have already called for hearings and bills to deal with the situation, but the fact is, there are few legal requirements in place at the moment to moderate chatbot content, for children or otherwise. Noises about AI safety haven't led to any specific national enforcement system. Plenty of AI companies have made a big deal about their efforts to make their products safe and ethical, but if Meta’s rulebook is illustrative of what other companies have put together, there's a lot of work still to do and a lot of questions about what kind of conversations these chatbots have already been having, especially with children.

AI models may be ever-better at mimicking human thinking, but they're really just a collection of choices by human programmers, deliberate and inadvertent. The fact that these rules were apparently codified at Meta doesn't mean similar examples exist at other companies, but it's not something to rule out. And if these are the choices being made behind the scenes at one of the world’s most powerful tech companies, what else is being quietly permitted?

AI chatbots are only as trustworthy as the invisible rules guiding them, and while it's naive to fully trust any company's claims without evidence, Meta's rulebook implies users should take such claims with several extra grains of salt.

Eric Hal Schwartz is a freelance writer for TechRadar with more than 15 years of experience covering the intersection of the world and technology. For the last five years, he served as head writer for Voicebot.ai and was on the leading edge of reporting on generative AI and large language models. He's since become an expert on the products of generative AI models, such as OpenAI’s ChatGPT, Anthropic’s Claude, Google Gemini, and every other synthetic media tool. His experience runs the gamut of media, including print, digital, broadcast, and live events. Now, he's continuing to tell the stories people want and need to hear about the rapidly evolving AI space and its impact on their lives. Eric is based in New York City.

Read Entire Article