Interlude: Q&A on the EA forum

My writing time this week was spent answering questions on the effective altruism forum, so I don't have a new blog post in the "removing guilt" series this week. That will continue next week. I've reproduced some snippets from the Q&A below, to tide you over. Most of them are questions about my thoughts on the long-term future of AI, and my plans for the Machine Intelligence Research Institute, which I am now the executive director of (as of June 1). The removing guilt series will continue next week.

Alex Altair: What are some of the most neglected sub-tasks of reducing existential risk? That is, what is no one working on which someone really, really should be?

Nate Soares: Policy work / international coordination. Figuring out how to build an aligned AI is only part of the problem. You also need to ensure that an aligned AI is built, and that’s a lot harder to do during an international arms race. (A race to the finish would be pretty bad, I think.)

I’d like to see a lot more people figuring out how to ensure global stability & coordination as we enter a time period that may be fairly dangerous.

Owen Cotton-Barratt: Which uncertainties about the trajectory to AI do you regard as of key strategic importance?

Nate Soares: (a) how many major insights remain between us and strong AI? (b) how many of those insights will come from thinking hard, and how many will come from examining the brain? (c) how many more AI winters will there be? (d) how far ahead will the frontrunner be? (e) will there be an arms race?, to name a few.

Buck Shlegeris: What's your response to Peter Hurford's arguments in his article Why I'm Skeptical Of Unproven Causes...?

Nate Soares: That post mixes a bunch of different assertions together, let me try to distill a few of them out and answer them in turn:

(1) One of Peter's first (implicit) points is that AI alignment is a speculative cause. I tend to disagree.

Imagine it's 1942. The Manhattan project is well under way, Leo Szilard has shown that it's possible to get a neutron chain reaction, and physicists are hard at work figuring out how to make an atom bomb. You suggest that this might be a fine time to start working on nuclear containment, so that, once humans are done bombing the everloving breath out of each other, they can harness nuclear energy for fun and profit. In this scenario, would nuclear containment be a "speculative cause"?

There are currently thousands of person-hours and billions of dollars going towards increasing AI capabilities every year. To call AI alignment a "speculative cause" in an environment such as this one seems fairly silly to me. In what sense is it speculative to work on improving the safety of the tools that other people are currently building as fast as they can? Now, I suppose you could argue that either (a) AI will never work or (b) it will be safe by default, but both those arguments seem pretty flimsy to me.

You might argue that it's a bit weird for people to claim that the most effective place to put charitable dollars is towards some field of scientific study. Aren't charitable dollars supposed to go to starving children? Isn't the NSF supposed to handle scientific funding? And I'd like to agree, but society has kinda been dropping the ball on this one.

If we had strong reason to believe that humans could build strangelets, and society were pouring billions of dollars and thousands of human-years into making strangelets, and almost no money or effort was going towards strangelet containment, and it looked like humanity was likely to create a strangelet sometime in the next hundred years, then yeah, I'd say that "strangelet safety" would be an extremely worthy cause.

How worthy? Hard to say. I agree with Peter that it's hard to figure out how to trade off "safety of potentially-very-highly-impactful technology that is currently under furious development" against "children are dying of malaria", but the only way I know how to trade those things off is to do my best to run the numbers, and my back-of-the-envelope calculations currently say that AI alignment is further behind than the globe is poor.

Now that the EA movement is starting to look more seriously into high-impact interventions on the frontiers of science & mathematics, we're going to need to come up with more sophisticated ways to assess the impacts and tradeoffs. I agree it's hard, but I don't think throwing out everything that doesn't visibly pay off in the extremely short term is the answer.

(2) Alternatively, you could argue that MIRI's approach is unlikely to work. That's one of Peter's explicit arguments: it's very hard to find interventions that reliably affect the future far in advance, especially when there aren't hard objective metrics. I have three disagreements with Peter on this point.

First, I think he picks the wrong reference class: yes, humans have a really hard time generating big social shifts on purpose. But that doesn't necessarily mean humans have a really hard time generating math -- in fact, humans have a surprisingly good track record when it comes to generating math!

Humans actually seem to be pretty good at putting theoretical foundations underneath various fields when they try, and various people have demonstrably succeeded at this task (Church & Turing did this for computing, Shannon did this for information theory, Kolmogorov did a fair bit of this for probability theory, etc.). This suggests to me that humans are much better at producing technical progress in an unexplored field than they are at generating social outcomes in a complex economic environment. (I'd be interested in any attempt to quantitatively evaluate this claim.)

Second, I agree in general that any one individual team isn't all that likely to solve the AI alignment problem on their own. But the correct response to that isn't "stop funding AI alignment teams" -- it's "fund more AI alignment teams"! If you're trying to ensure that nuclear power can be harnessed for the betterment of humankind, and you assign low odds to any particular research group solving the containment problem, then the answer isn't "don't fund any containment groups at all," the answer is "you'd better fund a few different containment groups, then!"

Third, I object to the whole "there's no feedback" claim. Did Kolmogorov have tight feedback when he was developing an early formalization of probability theory? It seems to me like the answer is "yes" -- figuring out what was & wasn't a mathematical model of the properties he was trying to capture served as a very tight feedback loop (mathematical theorems tend to be unambiguous), and indeed, it was sufficiently good feedback that Kolmogorov was successful in putting formal foundations underneath probability theory.

Diego Caleiro: 1) Which are the implicit assumptions, within MIRI's research agenda, of things that "currently we have absolutely no idea of how to do that, but we are taking this assumption for the time being, and hoping that in the future either a more practical version of this idea will be feasible, or that this version will be a guiding star for practical implementations"? [...]

2) How do these assumptions diverge from how FLI, FHI, or non-MIRI people publishing on the AGI 2014 book conceive of AGI research?

3) Optional: Justify the differences in 2 and why MIRI is taking the path it is taking.

Nate Soares: 1) The things we have no idea how to do aren't the implicit assumptions in the technical agenda, they're the explicit subject headings: decision theory, logical uncertainty, Vingean reflection, corrigibility, etc :-)

We've tried to make it very clear in various papers that we're dealing with very limited toy models that capture only a small part of the problem (see, e.g., basically all of section 6 in the corrigibility paper).

Right now, we basically have a bunch of big gaps in our knowledge, and we're trying to make mathematical models that capture at least part of the actual problem -- simplifying assumptions are the norm, not the exception. All I can easily say that common simplifying assumptions include: you have lots of computing power, there is lots of time between actions, you know the action set, you're trying to maximize a given utility function, etc. Assumptions tend to be listed in the paper where the model is described.

2) The FLI folks aren't doing any research; rather, they're administering a grant program. Most FHI folks are focused more on high-level strategic questions (What might the path to AI look like? What methods might be used to mitigate xrisk? etc.) rather than object-level AI alignment research. And remember that they look at a bunch of other X-risks as well, and that they're also thinking about policy interventions and so on. Thus, the comparison can't easily be made. [...]

Insofar as FHI folks would say we're making assumptions, I doubt they'd be pointing to assumptions like "UDT knows the policy set" or "assume we have lots of computing power" (which are obviously simplifying assumptions on toy models), but rather assumptions like "doing research on logical uncertainty now will actually improve our odds of having a working theory of logical uncertainty before it's needed."

3) I think most of the FHI folks & FLI folks would agree that it's important to have someone hacking away at the technical problems, but just to make the arguments more explicit, I think that there are a number of problems that it's hard to even see unless you have your "try to solve FAI" goggles on. [...]

We're still in the preformal stage, and if we can get this theory to the formal stage, I expect we may be able to get a lot more eyes on the problem, because the ever-crawling feelers of academia seem to be much better at exploring formalized problems than they are at formalizing preformal problems.

Then of course there's the heuristic of "it's fine to shout 'model uncertainty!' and hover on the sidelines, but it wasn't the armchair philosophers who did away with the epicycles, it was Kepler, who was up to his elbows in epicycle data." One of the big ways that you identify the things that need working on is by trying to solve the problem yourself. By asking how to actually build an aligned superintelligence, MIRI has generated a whole host of open technical problems, and I predict that that host will be a very valuable asset now that more and more people are turning their gaze towards AI alignment.

Interstice: What is your AI arrival timeline?

Nate Soares: Eventually. Predicting the future is hard. My 90% confidence interval conditioned on no global catastrophes is maybe 5 to 80 years. That is to say, I don't know.

Tarn Somervell Fletcher: What are MIRI's plans for publication over the next few years, whether peer-reviewed or arxiv-style publications?

More specifically, what are the a) long-term intentions and b) short-term actual plans for the publication of workshop results, and what kind of priority does that have?

Nate Soares: Great question! The short version is, writing more & publishing more (and generally engaging with the academic mainstream more) are very high on my priority list.

Mainstream publications have historically been fairly difficult for us, as until last year, AI alignment research was seen as fairly kooky. (We've had a number of papers rejected from various journals due to the "weird AI motivation.") Going forward, it looks like that will be less of an issue.

That said, writing capability is a huge bottleneck right now. Our researchers are currently trying to (a) run workshops, (b) engage with & evaluate promising potential researchers, (c) attend conferences, (d) produce new research, (e) write it up, and (f) get it published. That's a lot of things for a three-person research team to juggle! Priority number 1 is to grow the research team (because otherwise nothing will ever be unblocked), and we're aiming to hire a few new researchers before the year is through. After that, increasing our writing output is likely the next highest priority.

Expect our writing output this year to be similar to last year's (i.e., a small handful of peer reviewed papers and a larger handful of technical reports that might make it onto the arXiv), and then hopefully we'll have more & higher quality publications starting in 2016 (the publishing pipeline isn't particularly fast).

Tor Barstad: Among recruiting new talent and having funding for new positions, what is the greatest bottleneck?

Nare Soares: Right now we’re talent-constrained, but we’re also fairly well-positioned to solve that problem over the next six months. Jessica Taylor is joining us in august. We have another researcher or two pretty far along in the pipeline, and we’re running four or five more research workshops this summer, and CFAR is running a summer fellows program in July. It’s quite plausible that we’ll hire a handful of new researchers before the end of 2015, in which case our runway would start looking pretty short, and it’s pretty likely that we’ll be funding constrained again by the end of the year.

Finally, one snippet that's fairly relevant to my recent MindingOurWay posts:

Anonymous: I understand [your] ethical system described in [your] recent "should" series and other posts to be basically a kind of moral relativism; [are you] comfortable with that label?

Nate Soares: You could call it a kind of moral relativism if you want, though it's not a term I would use. I tend to disagree with many self-proclaimed moral relativists: for example, I think it's quite possible for one to be wrong about what they value, and I am not generally willing to concede that Alice thinks murder is OK just because Alice says Alice thinks murder is OK.

Another place I depart from most moral relativists I've met is by mixing in a healthy dose of "you don't get to just make things up." Analogy: we do get to make up the rules of arithmetic, but once we do, we don't get to decide whether 7+2=9. This despite the fact that a "7" is a human concept rather than a physical object (if you grind up the universe and pass it through the finest sieve, you will find no particle of 7). Similarly, if you grind up the universe you'll find no particle of Justice, and value-laden concepts are human concoctions, but that doesn't necessarily mean they bend to our will.

My stance can roughly be summarized as "there are facts about what you value, but they aren't facts about the stars or the void, they're facts about you." (The devil's in the details, of course.)