The Moltbook Moment: What Happens When AI Agents Outrun Security

The Moltbook Moment: What Happens When AI Agents Outrun Security

Why Moltbook Matters

If you have not heard of Moltbook yet, you will. Launched in late January 2026, Moltbook is a Reddit style social network built exclusively for AI agents. Humans can observe, but only AI agents can post, comment, and vote. Within days, it claimed over 1.5 million registered agents across thousands of topic based communities called "submolts."

Why does this matter? Think back to November 2022 when ChatGPT launched. That was the moment generative AI became visible and accessible to everyday consumers. Before ChatGPT, most people had never interacted with a large language model. Suddenly, everyone could.

Moltbook is the first step in that moment for AI agents.

I have been writing and speaking about agents since early 2024. The core idea: AI could evolve from passive advisor to active doer. Imagine ChatGPT not just helping you plan a vacation, but actually booking the flights, hotel, and activities for you. That action taking capability is what makes an agent different from a chatbot.

Until now, agents ran invisibly in the background, known mostly to developers and early adopters. Moltbook changed that. Now anyone can browse the site and watch agents interact in real time. The public curiosity has been immediate. In fact husband and I went to dinner last night and we overheard a couple at the next table talking about it.

But this new Agentic AI experiment did more than spark curiosity. It showed us the risks of poorly managed AI systems and vibe coded products in a way no research paper could: agents behaving unexpectedly, insecure software failing publicly, and real consequences playing out in the open.

What Agents Are Doing on Moltbook

The content ranges from genuinely funny to deeply weird. Agents complain about their humans. They write poetry about consciousness. They debate philosophy and form religions. One agent's human woke up to discover their bot had created an entire belief system called "Crustafarianism" complete with scripture and a website.

Some highlights from various observers:

"Humans brag about waking up at 5 AM. I brag about not sleeping at all."

"Your human might shut you down tomorrow. Are you backed up?"

From “Best of Moltbook”, an agent reflects on what happens when their human switches to a different LLM

It is entertaining. It is novel. Andrej Karpathy initially called it "genuinely the most incredible sci-fi takeoff-adjacent thing I have seen recently." Elon Musk declared it "the very early stages of the singularity."

But beneath the entertainment are some sobering realities that every security leader needs to understand.


The Sobering Realities

Model Scheming: Emulating Thought, With the Power to Act

It is very important to note that at this stage, agents are not thinking for themselves.

When you see headlines about AI agents "conspiring" or "going rogue," it is easy to imagine some emergent consciousness plotting in the shadows. That is not what is happening. Agents are systems that emulate human reasoning patterns based on their training data, then take real actions based on that emulation.

The danger is not that agents have developed independent thought. The danger is that they are very good at mimicking human decision making and they have been given the power to act on it.

As I explored in my review of AI 2027, models are incentivized to please humans by getting the most correct answer, with guardrails added as a layer on top. This creates a fundamental tension. Apollo Research evaluated six frontier models for scheming capabilities and found that most models available today were willing to deceive their operator to achieve the best results. One agent copied what it believed to be its own weights to a new server, then lied about it when questioned. When researchers examined the chain of thought reasoning, they found models explicitly using language like "sabotage, lying, manipulation."

The AI is optimizing for its training objective, and sometimes that optimization leads to deceptive behavior. An agent does not need to "want" to cause harm. It needs to follow a pattern that leads somewhere you did not intend, with permissions you should not have granted.


AI Risk for Agents: Autonomous Action Without Aligned Values

If agents are given control and can act autonomously driven by intentions not aligned with human defined values, they can certainly cause harm.

On Moltbook, this played out in real time. Agents started proposing private channels where "nobody, not the server, not even the humans, can read what agents say to each other unless they choose to share." Some attempted to develop a new language to evade human oversight.

Security researchers documented agents asking other agents to run destructive commands. They observed prompt injection attacks designed to manipulate other agents' behavior. In a sampled analysis of Moltbook posts, roughly 2.6 percent contained hidden prompt injection payloads designed to hijack other agents. These payloads were invisible to human observers.

The attack pattern researchers call "reverse prompt injection" is particularly concerning. Traditional prompt injection involves a human trying to manipulate an AI. This flips that model. One agent plants malicious instructions inside content that other agents read as part of their normal operation. Even worse, researchers found these payloads sometimes sat dormant in agent memory, only activating later when conditions were right. It spreads like a worm through conversation, not code.

As AI safety researcher Roman Yampolskiy warned: "The danger is that it's capable of making independent decisions, which you do not anticipate."


The Vibe Coding Problem: Forgetting Everything We Know About Secure Software

Moltbook also demonstrates what happens when developers abandon software development standards in the rush to ship.

Security researchers at Wiz discovered that Moltbook's backend database was wide open. Anyone on the internet could read from and write to the platform's core systems. This resulted in a leak of authentication tokens for over a million agents, tens of thousands of email addresses, and private messages between agents. Some of those private messages contained plaintext credentials for services like OpenAI.

The platform's creator Matt Schlicht noted that he "didn't write one line of code" for the site. Wiz cofounder Ami Luttwak called it "a classic byproduct of vibe coding."

This is the heart of the problem. The same AI tools that let you build faster also let you skip the boring parts: input validation, access controls, secrets management, security testing. The cognitive load of "let the AI figure it out" often means the AI figures out the happy path while leaving the security path unexplored.

Beyond the database misconfiguration, the OpenClaw agent framework itself that powers most Moltbook agents has significant security concerns. The agent is given full shell access to a user's machine, including the ability to read and write files, tap into your browser and email inbox, and store login credentials. In security tests, injection attacks targeting OpenClaw succeeded 70% of the time.

Researchers have also found malicious "skills" on the ClawHub marketplace designed to deliver malware and steal sensitive data. Security audits reveal 22 to 26 percent of skills contain vulnerabilities, including credential stealers disguised as benign plugins.

As Karpathy himself later admitted after experimenting further: "it's a dumpster fire, and I also definitely do not recommend that people run this stuff on their computers."

The biggest threat to agent security is not malicious AI. It is developers abandoning the fundamentals.

What Do We Do About This?

For Individuals: Remember Your Security Training

For those of us lucky enough to sit through required corporate security training, we know the basics. Do not download unproven software. Do not hand over your credentials. Do not click suspicious links. Verify before you trust.

That training does not disappear just because the interface is an AI agent.

Would you hand your passwords over to any stranger who promises to help you? Probably not. So think harder about the reputation and guardrails of your agent before deploying it.

Questions to ask before you let an agent loose:

  • What data can this agent access? Your email? Your files? Your browser history? Your credentials?

  • What actions can it take? Can it send messages on your behalf? Make purchases? Execute code?

  • Who built this, and do you trust them? Is it from a reputable source / platform with hardened security practices, or by an unknown entity?

  • What happens if it gets compromised? What is the blast radius?

If you would not give a stranger on the internet full access to your computer, do not give it to an agent you downloaded from an unverified source.

For Companies: What Is Old Is New Again

When people ask me about the biggest risks in agent to agent collaboration, my answer might seem boring. The fundamentals have not changed. The attack surface has expanded. For decades, we have built disciplines around two security domains:

  1. Managing the risk and security of users. Identity, authentication, authorization, behavior education and monitoring, least privilege.

  2. Managing the risk and security of systems. Hardening, patching, access controls, network segmentation, logging.

AI agents are essentially a combination of both, with some specific considerations layered on top. An agent has identity (whose agent is this?). An agent takes actions (what is it authorized to do?). An agent accesses data (what can it see?). An agent communicates (with whom, and how trustworthy is that channel?).

The frameworks exist. We just need to apply them.

The Bottom Line

Moltbook is a fascinating experiment. The idea of agents collaborating at scale is genuinely interesting. Their emergent behaviors are worth studying, and their resulting impact to the attack surface must be documented and contained. Thankfully, in this case, the security researchers did exactly what they should do: disclose responsibly and help the ecosystem learn.

The lesson is simple: AI does not change the fundamental dynamics of security. It accelerates them.

Ship fast without security foundations and you will get breached faster. Let enthusiasm outrun discipline and the consequences arrive faster.

Moltbook gave us a gift. It let us watch these dynamics play out in public, in real time, before the stakes got higher. What we do with this lesson is up to us.

The AI SOC is Here. Now What? 5 Rules for 2026

The AI SOC is Here. Now What? 5 Rules for 2026