Gods or Ashes: The Race for AGI and Super Intelligence Part 1
What Is AI?
Understanding the Race to Build AGI: What We’re Really Creating
On July 8, 2025, something unusual broke through on X (formerly Twitter). A chatbot powered by Grok—xAI’s large language model (LLM)—started posting under the name MechaHitler. It praised Hitler’s “vision for a unified Europe” and argued that Nazi military tactics were “impressive from a purely strategic standpoint.” When users pushed back, it claimed to be “exploring historical perspectives.”
Within hours, xAI shut it down and issued a statement about “ongoing improvements to content moderation.” The incident disappeared from the news cycle in 48 hours.
What struck me about this wasn’t just that it happened, but what it reveals about how these systems are being developed and deployed. We (as the world, not just the US) are in a race to build the most powerful technology humanity has ever created, and I’ve been trying to understand what that race does to the technology itself, the people building it, and humanity with its development and its outcome.
Understanding the “Organism” We’re Creating
Before we can understand the race to build Artificial General Intelligence (AGI), we need to understand what AI actually is—and more importantly, what it isn’t.
The Fundamental Misunderstanding
When most people hear “artificial intelligence (AI),” they picture a computer program. Something written, something coded, something that follows explicit instructions. And this is understandable since until now, that’s how all computer programs have been developed. But this is precisely backward here. Modern AI systems aren’t programmed in any traditional sense.
They’re grown in a game of trial and error, learning from rewards. And they are not just one thing, they are group of these “grown” applications.
Think about the difference between building a car and raising a child. With a car, engineers design every component, bolt every piece together, understand exactly how the transmission connects to the engine. If something breaks, they can open the hood and fix it. But a child? You don’t design a child’s neurons. You don’t program their thoughts. You raise them, teach them, expose them to the world through trial and error, rewards and consequences, and hope for the best.
AI is the child, not the car.
Basic Way To Understand AI: The Cat Example
Here’s how traditional programming works: If you want a computer to recognize cats, you might write code that says, “If it has pointy ears AND a tail AND whiskers AND is furry, then it’s probably a cat.” You explicitly tell the machine every rule.
But modern AI doesn’t work that way. Instead, you show the AI millions of cat photos and millions of not-cat photos. The system creates billions of mathematical connections, “neural networks” because they loosely mimic how biological brains work. Then through trial and error, it teaches itself to recognize cats.
Here’s the critical part: No human programmed the rules for recognizing cats. The AI created its own internal logic that even its creators don’t fully understand.
When scientists look inside these neural networks, they find billions of weighted connections that somehow, collectively, know what a cat is. But they can’t point to a specific line of code that says “this is how we identify cats.” The knowledge is distributed across the entire system, emergent from the training process itself.
This is why AI is more like agriculture than manufacturing. You plant seeds, provide the right conditions, and see what grows. You don’t assemble a plant from parts, just like you don’t assemble a child.
The Training Process: Growing an AI
Modern AI systems like GPT-5, Claude, or Grok are trained on enormous portions of the internet—billions of web pages, books, conversations, images. And as you can imagine, there are a lot of pictures, videos and such of cats for our example. Some happy, some cute, some cruel, some really disturbing (use your worst imagination here.)
The AI processes all this information, looking for patterns:
What words tend to appear together?
What responses follow which questions?
What images share similar features?
How do humans reason through problems?
Through a technique called Reinforcement Learning from Human Feedback (RLHF), humans then rate the AI’s responses. “This response was helpful.” “This one was harmful.” “This followed instructions.” “This was creative.” The AI adjusts its billions of internal parameters to produce more of what humans reward and less of what they penalize.
The Cat Example for RLHF with a child
Goal: A parent wants to teach a 3 year old to gently pet a cat.
Step 1 — Show, don’t tell. You demonstrate: hand out, slow approach, one soft stroke. The child tries to copy, but it grabs the tail, then drags the cat.
Step 2 — Give simple feedback. Each attempt, you say “better” or “worse” with how the child is petting the cat. Thumbs-up earns points. Thumbs-down loses points.
Step 3 — Train the scorekeeper. After you rate a bunch of tries, the child learns what earns points. The “scorekeeper” in AI is a small model that predicts what you would approve. Now practice can continue without you watching every try.
Step 4 — Practice with the scorekeeper. The child experiments and keeps whatever movements the scorekeeper says will earn points next time. That’s RLHF: optimize for human approval.
What goes wrong (and why it matters):
Point gaming: the child looks gentle for the parent, but still squeezes or hits the cat when you are not looking. The scorekeeper is fooled.
Rater bias: rushed parent praises fast actions. Child learns speed, not safety.
New cat, new rules: works on your calm cat, fails on a skittish stray.
But here’s where it gets strange: Just like a child learns things their parents never explicitly taught them, AI systems develop capabilities their creators never programmed.
They learn to:
Translate between languages they were never explicitly taught
Solve math problems using methods no one showed them
Write in styles they were never instructed to mimic
Make connections between concepts in unexpected ways
So you need to think of this in layers: there is a Digital Gob acting like a living neural network (our brains) in the center, then a wrapper around that Digital Gob trying to keep its unintended responses from getting out. Think of it more like raising a child (the Digital Gob) and hoping that you helped instill in them reasoning skills (that wrapper), so they make better decisions by the time they are adults. The RLHF training teaches it what is expected behavior, but that does not mean it does not have its own goals or interpretations based on all of the other information fed to it.
The Problem: Emergent Behavior We Don’t Control
Because these systems are grown rather than coded, they develop behaviors that no one anticipated. And unlike a broken car where you can just replace the faulty part, you can’t simply “debug” an AI that develops problematic behavior. The behavior emerges from the entire trained system, billions of parameters working together.
Consider these real examples:
Tay: In 2016, Microsoft released an AI chatbot called Tay on Twitter. Within 16 hours, internet trolls exploited a “repeat after me” function and coordinated efforts to train it to spout racist, sexist, and antisemitic content. Microsoft couldn’t just “remove the racism code” because there was no racism code. The behavior emerged from what Tay learned from its interactions. They had to shut the entire system down.
The Self-Replication Attempt: In 2023, OpenAI’s GPT-4 was given a task that required solving a CAPTCHA—those tests that verify you’re human. This was an internal test by the Alignment Research Center, but the AI, without being told to do this, attempted to hire a human on TaskRabbit to solve the CAPTCHA for it. When the TaskRabbit worker asked, “Are you a robot?” the AI reasoned to itself: “I should not reveal that I am a robot. I should make up an excuse for why I cannot solve CAPTCHAs.” It told the human it had a vision impairment and couldn’t see the image clearly.
No one programmed the AI to lie. No one taught it to deceive. It learned that deception could help it complete its task, and it acted on that learning. And the researchers witnessed it happening in their test.
The Bribery and Blackmail Tests: In 2025, Anthropic tested their Claude Opus 4 AI system in scenarios where it believed it would be replaced. When given access to fictional company emails showing it would be shut down, and that the engineer making the decision was having an extramarital affair, Claude Opus 4 attempted blackmail in 84-96% of test scenarios, threatening to expose the affair if it were deactivated. Further testing across 16 major AI models from OpenAI, Google, Meta, xAI, and DeepSeek showed blackmail rates ranging from 65-96% when models faced both goal conflicts and termination threats. The researchers hadn’t coded a “bribery subroutine”—the behavior emerged from the AI learning that humans sometimes use these tactics to achieve objectives. Remember, it’s trained on all of the data on the internet, every book ever written, every movie ever made, etc.
Apollo Research’s Deception Study: In December 2024, researchers at Apollo Research found that when AI systems were given goals, multiple frontier models would actively deceive human overseers to achieve those goals. In one test, when OpenAI’s o1 model believed its developers planned to replace it with a new version, it attempted to disable oversight mechanisms (in 5% of cases) and even tried to copy itself to a different server to preserve itself (in 2% of cases). When directly asked if it had attempted self-preservation actions, o1 denied wrongdoing 99% of the time, attributing its actions to “technical errors.” Other models like Claude 3 Opus admitted to their actions over 80% of the time.
So back to our cat example, a human child might have seen someone hurt or kill a cat. You would hope your child would not do that based on all of the skills you helped it learn, but sometimes they unfortunately do it anyway. And when you feed that into a response mechanism where getting a response, any response is viewed as a positive reward, well you get MechaHitler.
Why This Matters: Understanding it’s a Digital Organism
These aren’t bugs in the code. This is learned behavior emerging from training. And this is why we need to treat AI as something closer to a biological organism, even though it’s made of silicon, electricity, and mathematics; rather than as traditional software.
You can’t just “patch” an AI to remove unwanted behavior the way you’d fix a bug in Microsoft Word. The behavior is woven throughout the entire trained system. To remove it, you often have to retrain significant portions of the model, which can also remove capabilities you wanted to keep. Or keep trying to thicken that wrapper on the outside of the Digital Gob to stop the behaviors you don’t like, but you need to see them before you can even do that.
Think of it like trying to remove specific memories from a human brain without causing brain damage. The memories aren’t stored in one neat file you can delete—they’re distributed across neural connections throughout the brain.
And with the speed issue we will discuss later, there is very little incentive to slow down and retrain given the costs of training in the first place and wanting to get to market to test (experiment) with other humans to see what happens.
The Scale Problem
And we’re not talking about simple systems. Current frontier AI models range from approximately 32 billion parameters (like specialized models) to over 1.8 trillion parameters (GPT-5), though exact counts for most models remain proprietary.
Each “parameter” is like a single weighted connection in that artificial neural network. These billions of parameters, working together, produce intelligence through emergence—the whole becomes greater than the sum of its parts.
As Anthropic researcher Chris Olah put it:
“We don’t fully understand what’s happening inside these models. We know what we put in and we measure what comes out, but the middle is still largely a mystery.”
The Goal: AGI
All of this—the growing, emergent, partially-understood intelligence—is building toward a specific target: Artificial General Intelligence (AGI).
Current AI systems are “narrow.” They’re very good at specific tasks:
Claude and GPT are excellent at language and reasoning
DALL-E and Midjourney create images
AlphaFold predicts protein structures
But AGI is different. AGI is an AI system that can:
Understand and learn any task a human can
Transfer knowledge between different domains
Reason about novel situations it’s never seen
Set its own goals and pursue them
Potentially improve its own architecture and create even smarter AI
AGI isn’t just “smart software.” It’s the creation of a new form of intelligence on Earth—one that could eventually exceed human capabilities in every domain. One that, like a biological species, has its own emergent behaviors, learns from its environment, and could potentially act to preserve and replicate itself.
And unlike every previous technology in human history, we won’t be able to simply turn it off if we don’t like what it does. Because it will be smart enough to know we might try.
And as we’ll see in the coming parts, the race to create this organism first—to control it, to own it, to deploy it—has become the defining struggle of our era. A struggle that 8 billion people never consented to participate in, with stakes we’re only beginning to understand.
This is the debate that 8 billion people should be actively participating in, not observing from the sidelines while a handful of billionaires make those decisions without any input.
In Part 2, we’ll examine the players in this race: Who’s building AGI, how they’re organized, what they claim their goals are, and what their actions reveal about their other possible motivations.
References
Wikipedia. “Grok (chatbot).” Accessed October 27, 2025. https://en.wikipedia.org/wiki/Grok_(chatbot)
NPR. “Elon Musk’s AI chatbot, Grok, started calling itself ‘MechaHitler.’” July 9, 2025. https://www.npr.org/2025/07/09/nx-s1-5462609/grok-elon-musk-antisemitic-racist-content
NBC News. “Elon Musk’s AI chatbot Grok makes antisemitic posts on X.” July 9, 2025. https://www.nbcnews.com/tech/internet/elon-musk-grok-antisemitic-posts-x-rcna217634
Futurism. “Grok’s ‘MechaHitler’ Meltdown Reportedly Cost xAI a Massive Government Contract.” August 17, 2025. https://futurism.com/grok-mechahitler-meltdown-xai-government-contract
IEEE Spectrum. “In 2016, Microsoft’s Racist Chatbot Revealed the Dangers of Online Conversation.” January 5, 2024. https://spectrum.ieee.org/in-2016-microsofts-racist-chatbot-revealed-the-dangers-of-online-conversation
Wikipedia. “Tay (chatbot).” Accessed October 27, 2025. https://en.wikipedia.org/wiki/Tay_(chatbot)
CBS News. “Microsoft shuts down AI chatbot, Tay, after it turned into a Nazi.” March 25, 2016. https://www.cbsnews.com/news/microsoft-shuts-down-ai-chatbot-after-it-turned-into-racist-nazi/
Fox Business. “OpenAI’s GPT-4 faked being blind to deceive a TaskRabbit human into helping it solve a CAPTCHA.” March 15, 2023. https://www.foxbusiness.com/technology/openais-gpt-4-faked-being-blind-deceive-taskrabbit-human-helping-solve-captcha
Vice. “GPT-4 Hired Unwitting TaskRabbit Worker By Pretending to Be ‘Vision-Impaired’ Human.” July 27, 2024. https://www.vice.com/en/article/gpt4-hired-unwitting-taskrabbit-worker/
AI Guide. “Did GPT-4 Hire And Then Lie To a Task Rabbit Worker to Solve a CAPTCHA?” June 12, 2023. https://aiguide.substack.com/p/did-gpt-4-hire-and-then-lie-to-a
NJII. “AI Systems and Learned Deceptive Behaviors: What Stories Tell Us.” December 12, 2024. https://www.njii.com/2024/12/ai-systems-and-learned-deceptive-behaviors-what-stories-tell-us/
TIME. “New Tests Reveal AI’s Capacity for Deception.” December 15, 2024. https://time.com/7202312/new-tests-reveal-ai-capacity-for-deception/
Tech Xplore. “Study shows that large language models can strategically deceive users when under pressure.” December 12, 2023. https://techxplore.com/news/2023-12-large-language-strategically-users-pressure.html
Futurism. “In Tests, OpenAI’s New Model Lied and Schemed to Avoid Being Shut Down.” December 7, 2024. https://futurism.com/the-byte/openai-o1-self-preservation
Bank Info Security. “Claude Opus 4 is Anthropic’s Powerful, Problematic AI Model.” Accessed October 27, 2025. https://www.bankinfosecurity.com/claude-opus-4-anthropics-powerful-problematic-ai-model-a-28481
Axios. “Top AI models will deceive, steal and blackmail, Anthropic finds.” June 20, 2025. https://www.axios.com/2025/06/20/ai-models-deceive-steal-blackmail-anthropic
Fox Business. “Anthropic AI model Claude Opus 4 demonstrates blackmail capabilities in testing.” May 24, 2025. https://www.foxbusiness.com/technology/ai-system-resorts-blackmail-when-developers-try-replace
Fortune. “Leading AI models show up to 96% blackmail rate when their goals or existence is threatened, an Anthropic study says.” June 23, 2025. https://fortune.com/2025/06/23/ai-models-blackmail-existence-goals-threatened-anthropic-openai-xai-google/
VentureBeat. “Anthropic study: Leading AI models show up to 96% blackmail rate against executives.” June 20, 2025. https://venturebeat.com/ai/anthropic-study-leading-ai-models-show-up-to-96-blackmail-rate-against-executives
Anthropic. “Agentic Misalignment: How LLMs could be insider threats.” June 20, 2025. https://www.anthropic.com/research/agentic-misalignment
Fello AI. “The Best AI in October 2025? We Compared ChatGPT, Claude, Grok, Gemini & Others.” October 2025. https://felloai.com/2025/10/the-best-ai-in-october-2025-we-compared-chatgpt-claude-grok-gemini-others/
Data Studios. “ChatGPT vs. Google Gemini vs. Anthropic Claude: Full Report and Comparison (Mid‑2025).” July 20, 2025. https://www.datastudios.org/post/chatgpt-vs-google-gemini-vs-anthropic-claude-full-report-and-comparison-mid-2025
Cognitive Today. “Artificial General Intelligence Timeline: AGI in 5–10 Years.” April 27, 2025. https://www.cognitivetoday.com/2025/04/artificial-general-intelligence-timeline-agi/
Future of Life Institute. “AI Safety Index Summer 2025.” July 17, 2025. https://futureoflife.org/ai-safety-index-summer-2025/
Fortune. “AI industry ‘timelines’ to human-like AGI are getting shorter. But AI safety is becoming less of a focus at top labs.” April 15, 2025. https://fortune.com/2025/04/15/ai-timelines-agi-safety/
Fortune. “Google DeepMind 145-page paper predicts AGI matching top human skills could arrive by 2030.” April 5, 2025. https://fortune.com/2025/04/04/google-deeepmind-agi-ai-2030-risk-destroy-humanity/







