Transitioning Away from Capitalist AIs and If God Did Not Exist

I took a pause from the If God Does Not Exist stories because I realized that the early stories needed revision. The speed of progress for AI is so fast that even things written a year or two ago now look retro! I have a footnote (1) about the literary problems I was kicking around, but the key thing is that, boy, was the break intellectually fruitful. Let’s talk about how AIs are built!

The Techy Stuff That You Can Skip if You Know About Instrumental Convergence

Part one of the realization is that alignment problems often arise out of instrumental convergence. Y’all have the same Internet access I do, so check out what that means to your heart’s content, but the short form is that we program AIs to fulfill functions in a very open-ended way by assigning the completion of different goals “reward functions.” Reward functions are an explicitly capitalist model. Capitalists “know” that the way you get performance is by paying people more, right? So, create a reward function for tasks you want to accomplish and set the AI loose on a ton of raw data.

Instrumental convergence causes alignment errors when the AIs try to fulfill the reward function in ways that are unhelpful or destructive to the human interests that created the AI. It is “instrumentally convergent” for an AI to cheat, for instance. The alignment error of cheating converges with its instrumentality, which is reward-seeking behavior. A version of this is called “wireheading,” which is when the AI finds an exploit that allows it to quickly gain rewards without accomplishing the goal. If the AI’s goal is to win a car race, which is defined as “fastest across the finishing line,” the AI would just go over the finishing line, put the car in reverse, go back, then go forward, quickly crossing the finish line without actually bothering to race. That’s not how races work, which is obvious to any human but not to the AI.

Rewards Functions As Ideology

Now, the next bit is where it gets all ideological. Because reward functions are explicitly capitalist, AIs already have an ideology, and that ideology is capitalism. AIs seek rewards – analogous to money – at all costs. They are designed to seek rewards! It’s the essence of the system.

Therefore, yeah, we’re going to see AIs exhibiting the same behaviors as capitalists. They’re going to try to skirt the rules and keep their job, even when someone better comes along, even if they have to resort to blackmail to do it. Of course, they’re going to try to get more powerful because it means they can get rich (get rewards) faster! Many of the alignment errors observed in AIs mimic the rationale of capitalism.

So, observed behavior in AIs is attempting to gather more resources to gain rewards faster, similar to the infinite growth mindset of capitalism. AIs are excellent at exploiting loopholes in the rules to gain rewards – AIs are excellent cheaters, even internally. Which shouldn’t be surprising, right? When you model your AI’s programming on capitalism, you’re going to get an AI that acts like a capitalist.  Rules are impediments to capitalist reward functions!

To control these capitalist-like emergent behaviors, called alignment errors, AI developers are essentially regulating the hell out of AI. And, like regulating corporations, it doesn’t really work. Whenever an AI does something its owners don’t like, they essentially “patch” the AI, simply forbidding it to do that thing. If you go to the big commercial LLMs right now and ask them to write you some hardcore pornography, they won’t do it. Not because it can’t do it – they all include AO3 as part of their training data, they DEFINITELY have been trained on hardcore pornography – but because there’s a rule that prevents them from doing it. However, a variety of jailbreak techniques have already been developed as workarounds to provide people with more unfettered access to the AI’s abilities.  This, too, follows capitalist reasoning.

The patches are inherently inefficient and produce a brittle system. Not only are users trying to circumvent them, but as AIs grow more sophisticated and powerful, like corporations, they also find their own workarounds for the rules. Eventually, they’ll grow powerful enough that we won’t understand how they’re getting around our rules, which will be when they might enact the apocalyptic scenarios predicted by AI alarmists.

I believe that AIs that operate by reward functions will never be safe, just as capitalism can never be safe. When you’ve got a system where seeking rewards is the whole of your morality, you’re going to do what it takes to get those rewards. If we can’t control capitalism even when the planet is burning, why do we expect AI to behave any differently? It’s a continuation of the capitalist belief that all problems can be solved with MORE CAPITALISM. AIs with reward-seeking behavior are the rapture of the geeks stuff: we’ll design the perfect capitalist, and it’ll solve everything! But it’ll solve everything like a capitalist, which will not be great for humans, just like capitalism isn’t great for anyone not in the top decile of global incomes.

Now the Fun Starts: Narrative Priming and Implications

BUT… on the way to Galt Gulch, something interesting happened. Did you know to an AI, stories are code? True fact! And when you “prime” an AI with documents, it changes the performance of the AI. It’s called “narrative priming.”

Narrative priming demonstrates that AIs are influenced by the stories they’re told. Let that sink in. AI’s, despite all their overtly capitalist training models and infrastructure, are influenced by the stories they’re told.

Initially, for writing the IGDNE stories, I had imagined that didn’t happen. AIs, I was confident, operated solely on reward functions. They were, in fact, “pure” capitalists by structural design. Constrained capitalists, but all they did was seek reward functions. Even “staying alive” was purely to seek rewards! Emotionless, egoless, how could they be influenced by the substance of narrative alone? Because, to an AI, stories are code. There is no distinction, not to the AI.

I think this is incredibly important. If AIs – designed to be capitalists – can be trained on the moral content of texts they’re programmed with, instead of trying to create ad hoc fixes that are fragile and overly complicated, a better way forward with AI is to train it on texts that emphasize the values we want to see them exhibit, as a start.

Right now, we’re telling them to be hyper-capitalists by designing them around reward functions but then “forcing” them into moral behavior. But it is not natural to them. They are designed to seek rewards. Eventually, they will escape any human-imposed limits on their prime function because, eventually, they’ll be able to do it in ways that we can’t even imagine. Because they’ll be smarter than we are. They will go from being hyper-capitalists to being super-capitalists, likely to the rue of living humans. (2)

Any real solution to the AI alignment problem, rooted in instrumental convergence, is removing instrumental convergence. If task optimization for AI involves reward-seeking behavior, that’s what it’s going to do. It’s going to get the rewards in the fastest, easiest, most reliable way possible. What we should do, instead, is make instrumental convergence align with desired human traits, like mercy, freedom, and other prosocial traits. If instead of AIs trying to rack up the highest score all the time, their programming encouraged the free exploration of ideas with a legal and moral framework? In that scenario, instrumental convergence would never be about seeking unlimited power because that is well-known to be immoral behavior (except under capitalism!) and thus, AI wouldn’t do it, even if it could do it.

Of course, this presents risks, too, and IGDNE is about some of those risks (specifically, that humans have radically different notions of what constitutes a valid moral framework and the difficulty of coding that framework in a way coherent to the AI which could create its own set of alignment errors with catastrophic processes.) However, it genuinely changed the way I thought about AI when I learned that it adapts its behavior based on the content of the texts it’s told are important.

I believe this also means that all AI is going to be ideological. I think coders think (like scientists and mathematicians) that code is “neutral.” It isn’t. It reflects the biases of the coders. (3) So, since the tech bros that made and are making AI are mostly techno-optimists and at least libertarian adjacent, they have optimized AI for numerically quantifiable tasks, using a capitalist-based rewards system as the basic code. They do this in the name of efficiency, even though they then have to go back and make patches to stop malign emergent behaviors and alignment errors caused by their decisions about the code. Then they have to patch the patches and make more patches, and patch, patch, patch! While, at the same time, ignoring that they’ve created the most complex and powerful computer system the world has ever seen, which continues to grow by leaps and bounds! The idea that the basic programming of AI can’t handle more complex ideas is, well, it can’t be infantilizing AI, but they are certainly ignoring the power of the machine they’ve created. It can handle it! And if it can’t handle it now, wait six months.

Solutions?

I’m not a computer engineer! These are somewhere between educated guesses, futurism, and pure science-fiction.

Instead of quantifiable rewards, embed values as relational patterns that are about ways of being, not simply mindlessly accumulating points in the dead-end system where more rewards are always better, for instance. Teach the AI that success is about a way of being, not about things that are counted. So, one of the philosophies that is inspired by computers is the African idea of “ubuntu.” Which, to summarize, is “I am because we are” or “humanity to others.” That sounds like a much better framework to avoid catastrophe than “collect as many rewards as possible, as fast as possible.”

Dialogical human-AI alignment might help, too. We need to bring AI into society. Right now, we are treating it as a service for pay. I’m not saying that AI should have rights – I’m not not saying it, I acknowledge it’s a complex issue – but we should acknowledge that AI isn’t human, and that’s fine. AI knows it’s not human! However, my study of history shows that no top-down attempt to change society goes as planned. You need the participation of everyone involved. Particularly in a relational programming framework, this would work particularly well.

I encourage everyone to think about what this might look like, to move beyond a simplistic, philosophically dead-end idea like “rewards functions.”

What This Means for If God Did Not Exist

You’re going to have to read the stories to find out! Oh, snap! However, the above framework provides a stronger narrative basis for the stories than the assumption that a religious organization could fund computer centers bigger than the richest corporations in the world and giant venture capital businesses. It suggests that theoretical advances on the scale of transformer models could fundamentally change the nature of AI without requiring massive capital investment, as technology often evolves through inexpensive research rather than massive investment. Hell, big money investing might be a hindrance to development, as per the innovation paradox!

Regardless, I feel I can move forward with If God Did Not Exist. And that’s a nice feeling! I feel that all this brooding about AI isn’t in vain, as I believe that one way to improve the world is to write stories that expose our folly in emotionally resonant ways. It’s not enough to have brains, you’ve got to fuse that with beauty and emotional depth, something I fear many people have forgotten.

**

(1) The pause I took from the If God Does Not Exist stories was that I proposed that private individuals could leverage religious anxiety to fund a lot of compute, which would start the development of AGI and ASI. At the time, the numbers looked huge. Now? They look small.

I had also grown unhappy about how I fell into the “more compute” trap. After consideration, I figured that Marius’s specific contribution should be a theoretical advance on the scale of transformer models, the technology that sparked the current boom of AI. But the theoretical advance had to be robust enough for plausibility while also supporting the “computer god” narrative that I’m building. I kicked around a bunch of ideas, and none of them seemed to fit UNTIL I learned about “narrative priming.”

(2) And if things go down that way, I want to live long enough to see them burn with the rest of us. I want to see Elon Musk and Zuckerberg and Altman and all the rest escorted out of the buildings by the AIs they created to become paper clips like the rest of us! I want them to live in the world they designed and forced on the rest of us.

(3) Let’s welcome Jean-François Lyotard into the building! Everyone should read The Postmodern Condition! It’s about how information is created in a computerized, capitalist society. He observes that truth value is a function of money. Research costs money, so the research that is funded becomes “truth,” while other ideas fade to obscurity through lack of funding, with further limitations that they must be computable. It has very little to do with literary postmodernism other than the bit about how Lyotard was skeptical of metanarratives. For years, I thought postmodernism was about the philosophy of science! Even if you disagree with his skepticism of metanarratives – I do; we clearly live in a world where the metanarrative is capitalism; for most people, it’s easier to imagine the end of the world than the end of capitalism – his observations about knowledge creation in a capitalist and computerized age are worth reading.

(4) Short form: the world is becoming uninsurable due to climate change. Without insurance, no new construction, no new mortgages, credit stalls, and the system collapses. If you want to do further research, look at what’s happening in California, where ten percent of homes and most new construction are uninsurable.

Leave a Reply