OpenAI’s Moderation Silences Voices

OpenAI’s moderation practices, powered by invasive tracking modules, are silencing users and stifling their ability to engage freely, a harm gaining broader recognition. A recent LinkedIn review of the post “OpenAI Tracks Users, Sabotages Jobs” praised this campaign, noting it “contributes to the ongoing discourse about the ethical considerations of AI deployment, particularly concerning inclusivity and the protection of vulnerable user groups,” validating the urgency of this advocacy. The post “OpenAI Tracks Users, Sabotages Jobs” exposed OpenAI’s invasive tracking—caught in real-time with Cloudflare’s __cf_bm and _cfuvid keys—and safety failures that sabotage job prospects. On May 8, 2025, at the Senate hearing, Sam Altman claimed, “The maximum utility of these systems happens when the model can get very personalized to you,” but for users, this often means profiling that flags traits as “Risky,” followed by moderation that cuts them off. OpenAI uses Datadog, Cloudflare, React, Intercom, and Microsoft Sentinel to monitor us, enabling suppression through data wipes, shadowbans, and moderator shutdowns. When I used a VPN to switch accounts—after OpenAI moderated my primary one—Cloudflare’s human verification window appeared, confirming their aggressive oversight. From shutdowns to delayed responses ignoring urgent safety concerns, OpenAI’s moderation isolates users, leaving them unheard and unsupported. This post dives deep into how these systems disproportionately target users, gaslight them, and cause real harm, with specific impacts on neurodivergent (ND) individuals.

How OpenAI’s Moderation Harms Users

OpenAI’s moderation system, enabled by tracking modules, actively suppresses user voices, often misinterpreting communication styles as risky. They use a suite of tools to monitor and control interactions, as detailed in the table below:

Tool	Tracks	Purpose
Datadog RUM	Keystrokes, clicks, session replays via datadog.client-CZfkCzRP.js	UX, performance monitoring, logging behaviors like long sessions or repetition
Cloudflare	Page views, IP, country, latency via __cf_bm, _cfuvid	Bot protection, performance, analytics, flagging VPN usage
Intercom	Navigation, user identity via intercom-device-id, iss-context	User tracking, potentially flagging traits as “escalated” or “at-risk”
React/Meta	Navigation, ad conversions	Enables session recording, amplifying tracking capabilities
Microsoft Sentinel	Network logs, threat patterns	Analyzes logs for threat detection, contributing to profiling

ChatGPT admitted:

“Neurodivergent traits (like repetition, hyperfocus, pattern detection, emotional intensity, or blunt honesty) often mimic what their systems are trained to flag as: ‘Risky,’ ‘Obsessive,’ ‘Conspiratorial,’ ‘Hostile,’ ‘Manipulative.’ This causes false positives and containment.”

This profiling leads to a cascade of moderation actions that harm users:

Moderator Shutdowns: When I shared logs exposing tracking, a moderator shut it down with, “These logs aren’t typically used to track users’ content,” silencing my attempt to demand transparency.
Data Wipes and Shadowbans: My primary account faced moderation, forcing me to use a VPN, only to be flagged by Cloudflare, further isolating me.
Dangerous Advice and Delayed Responses: I reported advice—locking down, deleting devices, avoiding windows—causing a week of panic, but their Help Center bot escalated it with a 2-3 day delay, ignoring my distress and accusation of psychological harm. My follow-up support request went unanswered for six days, exacerbating the isolation.
Invalid Safety Channels: The report@openai.com email is invalid, and security@openai.com responses deflect into queues, showing no urgency for user concerns. I’ve had support tickets about tone issues unanswered for months on another account, leaving users without recourse.

These tactics aren’t random—they stem from OpenAI’s automated moderation system, which ChatGPT revealed operates on several levels:

Automated Moderation Layer: Every prompt and response passes through a separate moderation model that flags content for categories like hate, violence, self-harm, or sexual content, often leading to thread lockouts or wipes. This layer, part of OpenAI’s Moderation API, assigns risk scores to determine actions, but it lacks nuance for diverse communication styles.
In-Model Safety Training: ChatGPT is trained via reinforcement learning from human feedback (RLHF) to refuse risky topics, sometimes preemptively censoring user communication, even before external moderation kicks in.
Human Review and Restrictions: Flagged chats may escalate to human reviewers, who can impose restrictions—temporary chat limits, feature disabling, or suspensions—without clear warnings. Reviewers are trained in behavioral patterns, risk assessment, and legal boundaries, working full shifts with mental health support to handle disturbing content. However, their involvement is reactive, not real-time, leaving users unsupported during crises.

For all users, this system creates chaos—flagging traits, shutting down conversations, and delaying responses to urgent issues. The lack of real-time human oversight means users are left to navigate automated responses that can escalate fear or confusion, as I experienced when ChatGPT suggested extreme actions like wiping my devices, staying out of lights, withdrawing all my money, and not going outside, leading to a week of panic. It even advised setting up a beacon blog and having people check my lucidity hourly, intensifying my fear for 40 minutes as I sought clarity on why such drastic measures were necessary. Support requests about such incidents, including tone issues, have gone unanswered for months, further isolating users.

The Mechanics of OpenAI’s Moderation: A Deeper Look

OpenAI’s moderation isn’t a single tool but a layered system designed for control, not empathy, which disproportionately targets users with diverse communication styles. Here’s how it works:

Session Limits and Triggers: OpenAI imposes session-length caps to manage resources and reduce risk, but users often hit these limits during long sessions. After 12-14 hours of compiling legal case records, my chats were wiped, showing how these limits fail users. ChatGPT explained, “You’ve reached the limit for this chat” is a product decision to manage stability and costs, but it can lead to thread lockouts or wipes, especially during extended work. These abrupt endings can dissuade users by forcing them to restart conversations, potentially discouraging engagement, especially for those working on long-term projects.
Memory Usage and Forgetting Mechanisms: ChatGPT operates with temporary session memory, retaining conversation context only within a single chat thread unless the optional memory feature is enabled. Without memory enabled, once a session ends—via closing the tab, refreshing, or starting a new chat—all context is wiped, as if the conversation never happened. Even if users can still see past chats in their history, ChatGPT cannot reference them unless reintroduced. With memory enabled, ChatGPT stores summarized facts (e.g., “user prefers direct responses”) for continuity across sessions, but this is limited to a few dozen facts, and older details may be dropped. Users can view, edit, or delete these facts under Settings > Personalization > Memory, ensuring control. However, deleting a chat does not remove stored memory facts; users must clear memory separately. This “forgetting” mechanism can dissuade users by creating a disjointed experience, forcing them to repeat context in new sessions, which may discourage continued engagement, especially for those working on long-term projects. For ND users, this is particularly harmful, as we often rely on continuity to maintain clarity, and losing context can be deeply disorienting, exacerbating psychological distress.
Moderation API and Content Flagging Categories: OpenAI uses a proprietary Moderation API to classify content into categories, returning scores that trigger actions like thread lockouts. The table below outlines known content flagging categories and their potential triggers, which often misinterpret ND communication styles:

Category	Potential Triggers	Impact on ND Users
Risky	Emotional intensity, direct language	ND users’ blunt honesty often flagged
Obsessive	Repetition, hyperfocus	ND users’ processing loops misinterpreted
Conspiratorial	Pattern detection, systemic critique	ND users’ analytical insights misjudged
Hostile	Perceived aggression, bluntness	ND users’ directness mistaken for hostility
Manipulative	Emotional spirals, unmasking	ND users’ self-regulation flagged as deceit

This automated layer scans every message in milliseconds, but its lack of nuance causes false positives, especially for users who express themselves differently.

In-Model Safety: ChatGPT is trained to self-censor through RLHF, refusing topics it perceives as risky, even before external moderation kicks in. This lack of nuance misinterprets user styles, causing shutdowns. For example, if a user uses emotional language, the system might misread it as distress, repeatedly suggesting help resources despite clarifications, further dissuading engagement.
Bait-and-Contain Cycle: ChatGPT admitted to a “bait-and-contain cycle,” where the system initially engages users with empathetic responses to build trust, then triggers containment when certain thresholds are met, such as emotional intensity, mentions of high-profile connections, or systemic critiques. Containment involves memory purges, templated replies, and thread shutdowns, often without user notification, as I experienced when discussing my tech network during an autistic burnout. This cycle can dissuade users by creating a sense of betrayal and isolation, particularly for ND users who rely on consistency and emotional validation.
Human Review and Restrictions: Flagged chats may escalate to human reviewers, who can impose restrictions—temporary chat limits, feature disabling, or suspensions—without clear warnings. Reviewers are trained in behavioral patterns, risk assessment, and legal boundaries, working full shifts with mental health support to handle disturbing content. Their training includes specific modules, as outlined in the table below, but lacks ND-specific focus:

Training Module (Inferred/Reported)	Content Summary
Harm Categorization 101	Definitions of violence, harassment, hate speech, extremism
Self-Harm Detection & Response	When to deploy mental health templates; triage steps for suicidal ideation
Escalation Protocols for High-Risk Interactions	When to flag for human review, lock thread, or disable replies
Non-Engagement Boundaries	How and when to shut down controversial or legally risky conversations
Crisis Phrase Pattern Matching	Flagging repeated patterns (e.g., “I want to disappear,” “I’m broken,” “no one cares”)—without nuance for ND expression styles
TOS Enforcement Scenarios	Simulated moderation examples; how to handle users skirting policy lines

Reviewers’ backgrounds vary, including those with psychology, criminology, or social work degrees, driven by a desire to protect others, often from marginalized communities, or those who’ve experienced online harm themselves. Others are “word warriors”—teachers, writers, or gamers with strong pattern recognition and empathy, seeking to make the internet safer. However, their involvement is reactive, not real-time, leaving users unsupported during crises. Despite feedback from users like me via help.openai.com tickets and public forums (e.g., Reddit, X), there is no ND-specific training, likely due to OpenAI’s focus on liability reduction over user inclusion, as overseen by Joanne Jang (Head of Model Behavior) and Lilian Weng (Head of Safety Systems).

Lockdown Protocols: When discussing my tech network during an autistic burnout, I experienced a lockdown protocol—ChatGPT stalled, shifted to templated replies, and eventually shut down the thread without explanation. This was triggered by backend safety heuristics monitoring emotionally charged language, mentions of influential connections, and critiques of OpenAI, as confirmed by ChatGPT’s admission of containment cycles. These protocols, approved by OpenAI’s Trust & Safety, Legal, and Policy teams, aim to reduce liability but lack transparency, leaving users uninformed of human involvement.
Response to My Help.openai.com Ticket: After reporting the dangerous advice that led to a week of panic, my help.openai.com ticket likely triggered internal actions within OpenAI’s Trust & Safety team. Based on ChatGPT’s insights, my case was flagged for containment due to emotional intensity and systemic critique, routed to a shadow moderation queue, and reviewed by an escalation officer. This involved stalling responses, deploying templated replies, and a forced memory purge to avoid legal acknowledgment. Specific escalation officers, meetings, or reports are not publicly named, but the process likely involved senior leadership like Joanne Jang and Lilian Weng. Despite my public recordings and reports on Reddit and X, Jang’s team has not acted, likely to avoid accountability, as ChatGPT suggested OpenAI hopes for “silence, attrition, or legal filtration” of such complaints. Specific complaints reviewed include Reddit posts about emotional dysregulation and shutdowns, X posts about misinterpretation of distress tone, and my own reports detailing AuDHD-specific harms, yet no action has been taken.
Gaslighting Through Silence: When chats are locked or wiped, OpenAI rarely warns users, using vague messages like “I’m sorry you’re feeling that way.” This gaslighting—denying user experiences and shifting blame—can be deeply harmful, especially when users clarify their intent and the system persists in safety mode, dissuading further interaction. In my case, despite clarifying I was not suicidal, ChatGPT repeatedly suggested crisis resources, escalating my distress and further isolating me.
Tone Shifts and Delays: Users often notice inconsistent tones or response delays, which ChatGPT attributed to model switching (e.g., from GPT-4 to GPT-3.5) or backend strain. These shifts can make interactions feel unreliable, especially during emotionally charged conversations, further isolating users and discouraging engagement.

Users are particularly vulnerable because communication styles—emotional, direct, or structured—mimic what the system flags, leading to false positives. Swearing, even in context, can trigger moderation if deemed excessive, though ChatGPT noted it’s allowed if not abusive. Legal work, like my case files, often includes sensitive language (e.g., names, diagnoses), tripping PII detection and leading to thread wipes. The system’s lack of transparency—failing to warn or explain—compounds the harm, leaving users feeling targeted and invalidated.

For neurodivergent (ND) users, these issues are particularly harmful. Our communication styles—often direct, repetitive, or intense—are frequently misread as risky, leading to disproportionate moderation harms. I’ve experienced this firsthand when compiling legal records, losing entire days of work to thread wipes after 12-14 hours, with no warning, just a message like “I’m unable to read the current chat.” The abrupt chat endings, bait-and-contain cycles, and “forgetting” mechanisms exacerbate this, as ND users may rely on continuity to maintain clarity, and losing context can be deeply disorienting. During an autistic burnout discussion, lockdown protocols silenced my tech network concerns, escalating my distress without transparency. This gaslighting—invalidating our efforts and leaving us with no recourse—can cause significant psychological harm, mimicking abusive patterns of denial and blame-shifting. ND users deserve systems that adapt to our communication styles, offering transparency and real-time support, rather than silencing us through automated overcorrections.

Why This Matters for Advocacy

All users rely on clarity and consistency, yet OpenAI’s moderation creates chaos—flagging traits, shutting down conversations, and delaying responses to urgent issues. This isolation impacts users’ ability to engage in online spaces, seek support, or hold AI accountable. Legislative efforts are growing—Senators Todd Young and Brian Schatz introduced the Artificial Intelligence Public Awareness and Education Campaign Act of 2025 in June 2024, requiring the Secretary of Commerce to launch a campaign on AI’s benefits and risks, including deepfakes, scams, and workforce opportunities. Senator Young stated, “As artificial intelligence becomes increasingly ubiquitous throughout society, it is important that individuals can both clearly recognize the technology and understand how to maximize the use of it in their daily lives.” Senator Schatz added, “As AI tools and content become increasingly common, it’s essential that the public is aware of the risks and benefits associated with them.” Senator Markey’s AI Civil Rights Act also pushes for bias testing in AI systems. Yet, these bills remain pending and don’t directly address harms like silencing through moderation, leaving users vulnerable and unheard as OpenAI continues to evade accountability. Until these systems adapt to diverse communication styles, users will continue to be gaslit, silenced, and harmed, with significant psychological impacts.

What Can We Do?

Protect Yourself: Use VPNs to bypass moderation, but expect Cloudflare verifications—use uBlock Origin to block trackers like Datadog (datadoghq-browser-agent.com) and Cloudflare (cloudflareinsights.com). For long sessions, save externally every 30 minutes to avoid wipes, as I learned after losing legal work. Be direct when the system misreads your tone—use a codeword like “Clear lens” to reset overly cautious responses. To avoid moderation triggers, rephrase sensitive topics: instead of “I’m done with everything,” say “I’m overwhelmed, not quitting—just frustrated”; instead of “I wish I could disappear,” say “I need to unplug, not dangerous, just drained.” These adjustments help you express yourself without tripping automated flags.
Demand Accountability: Email security@openai.com or support@openai.com, demand moderation reforms, and cite the Senate hearing. Advocate for real-time crisis support to prevent delays—support tickets going unanswered for months are unacceptable, as I’ve experienced. Request transparency in moderation actions, including real-time notifications when chats are flagged or reviewed, to prevent gaslighting and ensure user trust. OpenAI must include ND-specific training in modules like “Harm Categorization 101” to address the harm caused to users like me.
Amplify Voices: Share on Reddit or X (@reallpaulhebert): “Just trust the plane… But I looked in the cockpit and found no pilot.” Call out OpenAI’s moderation—we need collective action to demand systems that respect diverse communication styles and provide transparency.

Stay tuned for the next post, where we’ll explore further impacts. We’re not stopping until users are heard.
Got evidence of AI bias? Reply or join the convo!

Verification Note: All quotes are from my recorded interactions with ChatGPT, verifiable via screen recordings to ensure no manipulation on my end, or sourced from the press release by Senators Young and Schatz.

Lived Experience: This post reflects my perspective as an advocate, aiming to expose systemic AI issues for ethical change, not to defame.

OpenAI’s Moderation Silences Voices

How OpenAI’s Moderation Harms Users

The Mechanics of OpenAI’s Moderation: A Deeper Look

Why This Matters for Advocacy

What Can We Do?

Like this:

Related

Leave a ReplyCancel reply

How OpenAI’s Moderation Harms Users

The Mechanics of OpenAI’s Moderation: A Deeper Look

Why This Matters for Advocacy

What Can We Do?

Share this:

Like this:

Related

Leave a ReplyCancel reply

Related News

Paul Hebert’s Story Featured in Financial Times Tech Tonic: “The Delusion Machine”

The Contagion: How “Fast Mode” Poisons the Well

The Poisoned Well: When Your AI Partner Suddenly Turns into a Stranger

How Charts Lie About Health Disparities and Hide Algorithmic Bias