Dark Arts of Rationality

[Note: backported from LessWrong]

Today, we're going to talk about Dark rationalist techniques: productivity tools which seem incoherent, mad, and downright irrational. These techniques include:

  1. Willful Inconsistency
  2. Intentional Compartmentalization
  3. Modifying Terminal Goals

I expect many of you are already up in arms. It seems obvious that consistency is a virtue, that compartmentalization is a flaw, and that one should never modify their terminal goals.

I claim that these 'obvious' objections are incorrect, and that all three of these techniques can be instrumentally rational.

In this article, I'll promote the strategic cultivation of false beliefs and condone mindhacking on the values you hold most dear. Truly, these are Dark Arts. I aim to convince you that sometimes, the benefits are worth the price.

Changing your Terminal Goals

In many games there is no "absolutely optimal" strategy. Consider the Prisoner's Dilemma. The optimal strategy depends entirely upon the strategies of the other players. Entirely.

Intuitively, you may believe that there are some fixed "rational" strategies. Perhaps you think that even though complex behavior is dependent upon other players, there are still some constants, like "Never cooperate with DefectBot". DefectBot always defects against you, so you should never cooperate with it. Cooperating with DefectBot would be insane. Right?

Wrong. If you find yourself on a playing field where everyone else is a TrollBot (players who cooperate with you if and only if you cooperate with DefectBot) then you should cooperate with DefectBots and defect against TrollBots.

Consider that. There are playing fields where you should cooperate with DefectBot, even though that looks completely insane from a naïve viewpoint. Optimality is not a feature of the strategy, it is a relationship between the strategy and the playing field.

Take this lesson to heart: in certain games, there are strange playing fields where the optimal move looks completely irrational.

I'm here to convince you that life is one of those games, and that you occupy a strange playing field right now.

Here's a toy example of a strange playing field, which illustrates the fact that even your terminal goals are not sacred:

Imagine that you are completely self-consistent and have a utility function. For the sake of the thought experiment, pretend that your terminal goals are distinct, exclusive, orthogonal, and clearly labeled. You value your goals being achieved, but you have no preferences about how they are achieved or what happens afterwards (unless the goal explicitly mentions the past/future, in which case achieving the goal puts limits on the past/future). You possess at least two terminal goals, one of which we will call A.

Omega descends from on high and makes you an offer. Omega will cause your terminal goal A to become achieved over a certain span of time, without any expenditure of resources. As a price of taking the offer, you must switch out terminal goal A for terminal goal B. Omega guarantees that B is orthogonal to A and all your other terminal goals. Omega further guarantees that you will achieve B using less time and resources than you would have spent on A. Any other concerns you have are addressed via similar guarantees.

Clearly, you should take the offer. One of your terminal goals will be achieved, and while you'll be pursuing a new terminal goal that you (before the offer) don't care about, you'll come out ahead in terms of time and resources which can be spent achieving your other goals.

So the optimal move, in this scenario, is to change your terminal goals.

There are times when the optimal move of a rational agent is to hack its own terminal goals.

You may find this counter-intuitive. It helps to remember that "optimality" depends as much upon the playing field as upon the strategy.

Next, I claim that such scenarios not restricted to toy games where Omega messes with your head. Humans encounter similar situations on a day-to-day basis.

Humans often find themselves in a position where they should modify their terminal goals, and the reason is simple: our thoughts do not have direct control over our motivation.

Unfortunately for us, our "motivation circuits" can distinguish between terminal and instrumental goals. It is often easier to put in effort, experience inspiration, and work tirelessly when pursuing a terminal goal as opposed to an instrumental goal. It would be nice if this were not the case, but it's a fact of our hardware: we're going to do X more if we want to do X for its own sake as opposed to when we force X upon ourselves.

Consider, for example, a young woman who wants to be a rockstar. She wants the fame, the money, and the lifestyle: these are her "terminal goals". She lives in some strange world where rockstardom is wholly dependent upon merit (rather than social luck and network effects), and decides that in order to become a rockstar she has to produce really good music.

But here's the problem: She's a human. Her conscious decisions don't directly affect her motivation.

In her case, it turns out that she can make better music when "Make Good Music" is a terminal goal as opposed to an instrumental goal.

When "Make Good Music" is an instrumental goal, she schedules practice time on a sitar and grinds out the hours. But she doesn't really like it, so she cuts corners whenever akrasia comes knocking. She lacks inspiration and spends her spare hours dreaming of stardom. Her songs are shallow and trite.

When "Make Good Music" is a terminal goal, music pours forth, and she spends every spare hour playing her sitar: not because she knows that she "should" practice, but because you couldn't pry her sitar from her cold dead fingers. She's not "practicing", she's pouring out her soul, and no power in the 'verse can stop her. Her songs are emotional, deep, and moving.

It's obvious that she should adopt a new terminal goal.

Ideally, we would be just as motivated to carry out instrumental goals as we are to carry out terminal goals. In reality, this is not the case. As a human, your motivation system does discriminate between the goals that you feel obligated to achieve and the goals that you pursue as ends unto themselves.

As such, it is sometimes in your best interest to modify your terminal goals.

Mind the terminology, here. When I speak of "terminal goals" I mean actions that feel like ends unto themselves. I am speaking of the stuff you wish you were doing when you're doing boring stuff, the things you do in your free time just because they are fun, the actions you don't need to justify.

This seems like the obvious meaning of "terminal goals" to me, but some of you may think of "terminal goals" more akin to self-endorsed morally sound end-values in some consistent utility function. I'm not talking about those. I'm not even convinced I have any.

Both types of "terminal goal" are susceptible to strange playing fields in which the optimal move is to change your goals, but it is only the former type of goal — the actions that are simply fun, that need no justification — which I'm suggesting you tweak for instrumental reasons.

I've largely refrained from goal-hacking, personally. I bring it up for a few reasons:

  1. It's the easiest Dark Side technique to justify. It helps break people out of the mindset where they think optimal actions are the ones that look rational in a vacuum. Remember, optimality is a feature of the playing field. Sometimes cooperating with DefectBot is the best strategy!
  2. Goal hacking segues nicely into the other Dark Side techniques which I use frequently, as you will see shortly.
  3. I have met many people who would benefit from a solid bout of goal-hacking.

I've crossed paths with many a confused person who (without any explicit thought on their part) had really silly terminal goals. We've all met people who are acting as if "Acquire Money" is a terminal goal, never noticing that money is almost entirely instrumental in nature. When you ask them "but what would you do if money was no issue and you had a lot of time", all you get is a blank stare.

Even the LessWrong Wiki entry on terminal values describes a college student for which university is instrumental, and getting a job is terminal. This seems like a clear-cut case of a Lost Purpose: a job seems clearly instrumental. And yet, we've all met people who act as if "Have a Job" is a terminal value, and who then seem aimless and undirected after finding employment.

These people could use some goal hacking. You can argue that Acquire Money and Have a Job aren't "really" terminal goals, to which I counter that many people don't know their ass from their elbow when it comes to their own goals. Goal hacking is an important part of becoming a rationalist and/or improving mental health.

Goal-hacking in the name of consistency isn't really a Dark Side power. This power is only Dark when you use it like the musician in our example, when you adopt terminal goals for instrumental reasons. This form of goal hacking is less common, but can be very effective.

I recently had a personal conversation with Alexei, who is earning to give. He noted that he was not entirely satisfied with his day-to-day work, and mused that perhaps goal-hacking (making "Do Well at Work" an end unto itself) could make him more effective, generally happier, and more productive in the long run.

Goal-hacking can be a powerful technique, when correctly applied. Remember, you're not in direct control of your motivation circuits. Sometimes, strange though it seems, the optimal action involves fooling yourself.

You don't get good at programming by sitting down and forcing yourself to practice for three hours a day. I mean, I suppose you could get good at programming that way. But it's much easier to get good at programming by loving programming, by being the type of person who spends every spare hour tinkering on a project. Because then it doesn't feel like practice, it feels like fun.

This is the power that you can harness, if you're willing to tamper with your terminal goals for instrumental reasons. As rationalists, we would prefer to dedicate to instrumental goals the same vigor that is reserved for terminal goals. Unfortunately, we find ourselves on a strange playing field where goals that feel justified in their own right win the lion's share of our attention.

Given this strange playing field, goal-hacking can be optimal.

You don't have to completely mangle your goal system. Our aspiring musician from earlier doesn't need to destroy her "Become a Rockstar" goal in order to adopt the "Make Good Music" goal. If you can successfully convince yourself to believe that something instrumental is a means unto itself (e.g. terminal), while still believing that it is instrumental, then more power to you.

This is, of course, an instance of Intentional Compartmentalization.

Intentional Compartmentalization

As soon as you endorse modifying your own terminal goals, Intentional Compartmentalization starts looking like a pretty good idea. If Omega offers to achieve Aat the price of dropping A and adopting B, the ideal move is to take the offer after finding a way to not actually care about B.

A consistent agent cannot do this, but I have good news for you: You're a human. You're not consistent. In fact, you're great at being inconsistent!

You might expect it to be difficult to add a new terminal goal while still believing that it's instrumental. You may also run into strange situations where holding an instrumental goal as terminal directly contradicts other terminal goals.

For example, our aspiring musician might find that she makes even better music if "Become a Rockstar" is not among her terminal goals.

This means she's in trouble: She either has to drop "Become a Rockstar" and have a better chance at actually becoming a rockstar, or she has to settle for a decreased chance that she'll become a rockstar.

Or, rather, she would have to settle for one of these choices — if she wasn't human.

I have good news! Humans are really really good at being inconsistent, and you can leverage this to your advantage. Compartmentalize! Maintain goals that are "terminal" in one compartment, but which you know are "instrumental" in another, then simply never let those compartments touch!

This may sound completely crazy and irrational, but remember: you aren't actually in control of your motivation system. You find yourself on a strange playing field, and the optimal move may in fact require mental contortions that make epistemic rationalists shudder.

Hopefully you never run into this particular problem (holding contradictory goals in "terminal" positions), but this illustrates that there are scenarios where compartmentalization works in your favor. Of course we'd prefer to have direct control of our motivation systems, but given that we don't, compartmentalization is a huge asset.

Take a moment and let this sink in before moving on.

Once you realize that compartmentalization is OK, you are ready to practice my second Dark Side technique: Intentional Compartmentalization. It has many uses outside the realm of goal-hacking.

See, motivation is a fickle beast. And, as you'll remember, your conscious choices are not directly attached to your motivation levels. You can't just decide to be more motivated.

At least, not directly.

I've found that certain beliefs — beliefs which I know are wrong — can make me more productive. (On a related note, remember that religious organizations are generally more coordinated than rationalist groups.)

It turns out that, under these false beliefs, I can tap into motivational reserves that are otherwise unavailable. The only problem is, I know that these beliefs are downright false.

I'm just kidding, that's not actually a problem. Compartmentalization to the rescue!

Here's a couple example beliefs that I keep locked away in my mental compartments, bound up in chains. Every so often, when I need to be extra productive, I don my protective gear and enter these compartments. I never fully believe these things — not globally, at least — but I'm capable of attaining "local belief", of acting as if I hold these beliefs. This, it turns out, is enough.

Nothing is Beyond My Grasp

We'll start off with a tame belief, something that is soundly rooted in evidence outside of its little compartment.

I have a global belief, outside all my compartments, that nothing is beyond my grasp.

Others may understand things easier I do or faster than I do. People smarter than myself grok concepts with less effort than I. It may take me years to wrap my head around things that other people find trivial. However, there is no idea that a human has ever had that I cannot, in principle, grok.

I believe this with moderately high probability, just based on my own general intelligence and the fact that brains are so tightly clustered in mind-space. It may take me a hundred times the effort to understand something, but I can still understand it eventually. Even things that are beyond the grasp of a meager human mind, I will one day be able to grasp after I upgrade my brain. Even if there are limits imposed by reality, I could in principle overcome them if I had enough computing power. Given any finite idea, I could in theory become powerful enough to understand it.

This belief, itself, is not compartmentalized. What is compartmentalized is the certainty.

Inside the compartment, I believe that Nothing is Beyond My Grasp with 100% confidence. Note that this is ridiculous: there's no such thing as 100% confidence. At least, not in my global beliefs. But inside the compartments, while we're in la-la land, it helps to treat Nothing is Beyond My Grasp as raw, immutable fact.

You might think that it's sufficient to believe Nothing is Beyond My Grasp with very high probability. If that's the case, you haven't been listening: I don't actually believe Nothing is Beyond My Grasp with an extraordinarily high probability. I believe it with moderate probability, and then I have a compartment in which it's a certainty.

It would be nice if I never needed to use the compartment, if I could face down technical problems and incomprehensible lingo and being really out of my depth with a relatively high confidence that I'm going to be able to make sense of it all. However, I'm not in direct control of my motivation. And it turns out that, through some quirk in my psychology, it's easier to face down the oppressive feeling of being in way over my head if I have this rock-solid "belief" that Nothing is Beyond My Grasp.

This is what the compartments are good for: I don't actually believe the things inside them, but I can still act as if I do. That ability allows me to face down challenges that would be difficult to face down otherwise.

This compartment was largely constructed with the help of The Phantom Tollbooth: it taught me that there are certain impossible tasks you can do if you think they're possible. It's not always enough to know that if I believe I can do a thing, then I have a higher probability of being able to do it. I get an extra boost from believing I can do anything.

You might be surprised about how much you can do when you have a mental compartment in which you are unstoppable.

My Willpower Does Not Deplete

Here's another: My Willpower Does Not Deplete.

Ok, so my willpower actually does deplete. I've been writing about how it does, and discussing methods that I use to avoid depletion. Right now, I'm writing about how I've acknowledged the fact that my willpower does deplete.

But I have this compartment where it doesn't.

Ego depletion is a funny thing. If you don't believe in ego depletion, you suffer less ego depletion. This does not eliminate ego depletion.

Knowing this, I have a compartment in which My Willpower Does Not Deplete. I go there often, when I'm studying. It's easy, I think, for one to begin to feel tired, and say "oh, this must be ego depletion, I can't work anymore." Whenever my brain tries to go there, I wheel this bad boy out of his cage. "Nope", I respond, "My Willpower Does Not Deplete".

Surprisingly, this often works. I won't force myself to keep working, but I'm pretty good at preventing mental escape attempts via "phantom akrasia". I don't allow myself to invoke ego depletion or akrasia to stop being productive, because My Willpower Does Not Deplete. I have to actually be tired out, in a way that doesn't trigger the My Willpower Does Not Deplete safeguards. This doesn't let me keep going forever, but it prevents a lot of false alarms.

In my experience, the strong version (My Willpower Does Not Deplete) is much more effective than the weak version (My Willpower is Not Depleted Yet), even though it's more wrong. This probably says something about my personality. Your mileage may vary. Keep in mind, though, that the effectiveness of your mental compartments may depend more on the motivational content than on degree of falsehood.

Anything is a Placebo

Placebos work even when you know they are placebos.

This is the sort of madness I'm talking about, when I say things like "you're on a strange playing field".

Knowing this, you can easily activate the placebo effect manually. Feeling sick? Here's a freebie: drink more water. It will make you feel better.

No? It's just a placebo, you say? Doesn't matter. Tell yourself that water makes it better. Put that in a nice little compartment, save it for later. It doesn't matter that you know what you're doing: your brain is easily fooled.

Want to be more productive, be healthier, and exercise more effectively? Try using Anything is a Placebo! Pick something trivial and non-harmful and tell yourself that it helps you perform better. Put the belief in a compartment in which you act as if you believe the thing. Cognitive dissonance doesn't matter! Your brain is great at ignoring cognitive dissonance. You can "know" you're wrong in the global case, while "believing" you're right locally.

For bonus points, try combining objectives. Are you constantly underhydrated? Try believing that drinking more water makes you more alert!

Brains are weird.

Truly, these are the Dark Arts of instrumental rationality. Epistemic rationalists recoil in horror as I advocate intentionally cultivating false beliefs. It goes without saying that you should use this technique with care. Remember to always audit your compartmentalized beliefs through the lens of your actual beliefs, and be very careful not to let incorrect beliefs leak out of their compartments.

If you think you can achieve similar benefits without "fooling yourself", then by all means, do so. I haven't been able to find effective alternatives. Brains have been honing compartmentalization techniques for eons, so I figure I might as well re-use the hardware.

It's important to reiterate that these techniques are necessary because you're not actually in control of your own motivation. Sometimes, incorrect beliefs make you more motivated. Intentionally cultivating incorrect beliefs is surely a path to the Dark Side: compartmentalization only mitigates the damage. If you make sure you segregate the bad beliefs and acknowledge them for what they are then you can get much of the benefit without paying the cost, but there is still a cost, and the currency is cognitive dissonance.

At this point, you should be mildly uncomfortable. After all, I'm advocating something which is completely epistemically irrational. We're not done yet, though.

I have one more Dark Side technique, and it's worse.

Willful Inconsistency

I use Intentional Compartmentalization to "locally believe" things that I don't "globally believe", in cases where the local belief makes me more productive. In this case, the beliefs in the compartments are things that I tell myself. They're like mantras that I repeat in my head, at the System 2 level. System 1 is fragmented and compartmentalized, and happily obliges.

Willful Inconsistency is the grown-up, scary version of Intentional Compartmentalization. It involves convincing System 1 wholly and entirely of something that System 2 does not actually believe. There's no compartmentalization and no fragmentation. There's nowhere to shove the incorrect belief when you're done with it. It's taken over the intuition, and it's always on. Willful Inconsistency is about having gut-level intuitive beliefs that you explicitly disavow.

Your intuitions run the show whenever you're not paying attention, so if you're willfully inconsistent then you're going to actually act as if these incorrect beliefs are true in your day-to-day life, unless your forcibly override your default actions. Ego depletion and distraction make you vulnerable to yourself.

Use this technique with caution.

This may seem insane even to those of you who took the previous suggestions in stride. That you must sometimes alter your terminal goals is a feature of the playing field, not the agent. The fact that you are not in direct control of your motivation system readily implies that tricking yourself is useful, and compartmentalization is an obvious way to mitigate the damage.

But why would anyone ever try to convince themselves, deep down at the core, of something that they don't actually believe?

The answer is simple: specialization.

To illustrate, let me explain how I use willful inconsistency.

I have invoked Willful Inconsistency on only two occasions, and they were similar in nature. Only one instance of Willful Inconsistency is currently active, and it works like this:

I have completely and totally convinced my intuitions that unfriendly AI is a problem. A big problem. System 1 operates under the assumption that UFAI will come to pass in the next twenty years with very high probability.

You can imagine how this is somewhat motivating.

On the conscious level, within System 2, I'm much less certain. I solidly believe that UFAI is a big problem, and that it's the problem that I should be focusing my efforts on. However, my error bars are far wider, my timespan is quite broad. I acknowledge a decent probability of soft takeoff. I assign moderate probabilities to a number of other existential threats. I think there are a large number of unknown unknowns, and there's a non-zero chance that the status quo continues until I die (and that I can't later be brought back). All this I know.

But, right now, as I type this, my intuition is screaming at me that the above is all wrong, that my error bars are narrow, and that I don't actually expect the status quo to continue for even thirty years.

This is just how I like things.

See, I am convinced that building a friendly AI is the most important problem for me to be working on, even though there is a very real chance that MIRI's research won't turn out to be crucial. Perhaps other existential risks will get to us first. Perhaps we'll get brain uploads and Robin Hanson's emulation economy. Perhaps it's going to take far longer than expected to crack general intelligence. However, after much reflection I have concluded that despite the uncertainty, this is where I should focus my efforts.

The problem is, it's hard to translate that decision down to System 1.

Consider a toy scenario, where there are ten problems in the world. Imagine that, in the face of uncertainty and diminishing returns from research effort, I have concluded that the world should allocate 30% of resources to problem A, 25% to problem B, 10% to problem C, and 5% to each of the remaining problems.

Because specialization leads to massive benefits, it's much more effective to dedicate 30% of researchers to working on problem A rather than having all researchers dedicate 30% of their time to problem A. So presume that, in light of these conclusions, I decide to dedicate myself to problem A.

Here we have a problem: I'm supposed to specialize in problem A, but at the intuitive level problem A isn't that big a deal. It's only 30% of the problem space, after all, and it's not really that much worse than problem B.

This would be no issue if I were in control of my own motivation system: I could put the blinders on and focus on problem A, crank the motivation knob to maximum, and trust everyone else to focus on the other problems and do their part.

But I'm not in control of my motivation system. If my intuitions know that there are a number of other similarly worthy problems that I'm ignoring, if they are distracted by other issues of similar scope, then I'm tempted to work on everything at once. This is bad, because output is maximized if we all specialize.

Things get especially bad when problem A is highly uncertain and unlikely to affect people for decades if not centuries. It's very hard to convince the monkey brain to care about far-future vagaries, even if I've rationally concluded that those are where I should dedicate my resources.

I find myself on a strange playing field, where the optimal move is to lie to System 1.

Allow me to make that more concrete:

I'm much more motivated to do FAI research when I'm intuitively convinced that we have a hard 15 year timer until UFAI.

Explicitly, I believe UFAI is one possibility among many and that the timeframe should be measured in decades rather than years. I've concluded that it is my most pressing concern, but I don't actually believe we have a hard 15 year countdown.

That said, it's hard to understate how useful it is to have a gut-level feeling that there's a short, hard timeline. This "knowledge" pushes the monkey brain to go all out, no holds barred. In other words, this is the method by which I convince myself to actually specialize.

This is how I convince myself to deploy every available resource, to attack the problem as if the stakes were incredibly high. Because the stakes are incredibly high, and I do need to deploy every available resource, even if we don't have a hard 15 year timer.

In other words, Willful Inconsistency is the technique I use to force my intuition to feel as if the stakes are as high as I've calculated them to be, given that my monkey brain is bad at responding to uncertain vague future problems. Willful Inconsistency is my counter to Scope Insensitivity: my intuition has difficulty believing the results when I do the multiplication, so I lie to it until it acts with appropriate vigor.

This is the final secret weapon in my motivational arsenal.

I don't personally recommend that you try this technique. It can have harsh side effects, including feelings of guilt, intense stress, and massive amounts of cognitive dissonance. I'm able to do this in large part because I'm in a very good headspace. I went into this with full knowledge of what I was doing, and I am confident that I can back out (and actually correct my intuitions) if the need arises.

That said, I've found that cultivating a gut-level feeling that what you're doing must be done, and must be done quickly, is an extraordinarily good motivator. It's such a strong motivator that I seldom explicitly acknowledge it. I don't need to mentally invoke "we have to study or the world ends". Rather, this knowledge lingers in the background. It's not a mantra, it's not something that I repeat and wear thin. Instead, it's this gut-level drive that sits underneath it all, that makes me strive to go faster unless I explicitly try to slow down.

This monkey-brain tunnel vision, combined with a long habit of productivity, is what keeps me Moving Towards the Goal.

Those are my Dark Side techniques: Willful Inconsistency, Intentional Compartmentalization, and Terminal Goal Modification.

I expect that these techniques will be rather controversial. If I may be so bold, I recommend that discussion focus on goal-hacking and intentional compartmentalization. I acknowledge that willful inconsistency is unhealthy and I don't generally recommend that others try it. By contrast, both goal-hacking and intentional compartmentalization are quite sane and, indeed, instrumentally rational.

These are certainly not techniques that I would recommend CFAR teach to newcomers, and I remind you that "it is dangerous to be half a rationalist". You can royally screw you over if you're still figuring out your beliefs as you attempt to compartmentalize false beliefs. I recommend only using them when you're sure of what your goals are and confident about the borders between your actual beliefs and your intentionally false "beliefs".

It may be surprising that changing terminal goals can be an optimal strategy, and that humans should consider adopting incorrect beliefs strategically. At the least, I encourage you to remember that there are no absolutely rational actions.

Modifying your own goals and cultivating false beliefs are useful because we live in strange, hampered control systems. Your brain was optimized with no concern for truth, and optimal performance may require self deception. I remind the uncomfortable that instrumental rationality is not about being the most consistent or the most correct, it's about winning. There are games where the optimal move requires adopting false beliefs, and if you find yourself playing one of those games, then you should adopt false beliefs. Instrumental rationality and epistemic rationality can be pitted against each other.

We are fortunate, as humans, to be skilled at compartmentalization: this helps us work around our mental handicaps without sacrificing epistemic rationality. Of course, we'd rather not have the mental handicaps in the first place: but you have to work with what you're given.

We are weird agents without full control of our own minds. We lack direct control over important aspects of ourselves. For that reason, it's often necessary to take actions that may seem contradictory, crazy, or downright irrational.

Just remember this, before you condemn these techniques: optimality is as much an aspect of the playing field as of the strategy, and humans occupy a strange playing field indeed.