How do we govern AGI?
A brief survey of mechanisms to minimise risk and 🎯 maximise upside
Visualising AI by Google DeepMind [Full image]
Hello Everyone,
This is our second article in our series on AGI, you can see the first here. The AGI series has its own section now on the Newsletter, here.
Last call on Subscribing while our price is still at on the low end at $8 or $75, it will be going up to $10 or $100 in September.
Harry Law is probably one of the best emerging writers on A.I. history, ethics and governance I know of on Substack. It makes sense since he is a scholar. His Newsletter is called Learning from Examples.
Examples of his recent articles
I asked Harry to write something related to AGI and A.I. Governance.
By
August, 2023.An important disclaimer. These views are my views only and do not represent the views of my employer.
Not often are the benefits of AI discussed without its risks. The potential for today’s models to do enormous good—from boosting economic growth to helping us live longer, fuller lives—is often juxtaposed against the risks that they present. This is the case because the potential to take action cannot be neatly separated from the potential to cause harm.
So-called frontier models, those that are on the cutting edge of development, are typically the largest (and crucially, most capable) models that currently exist. Today, they cost millions of dollars to train. In the next few years, we can expect training runs to cost billions of dollars. While size isn’t everything, we can generally anticipate superior performance, better and more reliable use of existing skills, and the emergence of new capabilities as models scale. What isn’t so clear, though, is when we can expect these changes to take place, or whether they will also be contingent on the development of new algorithms and architectures.
Whatever the case, systems will become more dependable, capable, and adaptable in the coming years. In short, they will become more general. At some point the performance of systems may be roughly comparable to humans on any cognitive task we can imagine. Whether in five or fifty years, this is the threshold that a substantial proportion of observers connect with the emergence of Artificial General Intelligence (AGI).
A lot of work has been done on defining what we mean by the term, where today’s systems are in comparison to a hypothetical AGI, and what remains in order to get there. This essay will not explore these questions. Instead, I’ll focus on some of the governance tools we have at our disposal to ensure that highly capable agents are aligned with human intent. Before we look at a handful of the mechanisms for doing just that, though, I need to make a few clarifications.
I will be discussing the governance of a future system based on the paradigm that dominates the field today: deep learning. This analysis is premised on the assumption that the ingredients needed to create AGI are broadly the same as those that we use today, namely data, compute, and algorithms. More concretely, though, is the assumption that AGI will take the form of a very large artificial neural network or constellation thereof.
An important question here is whether we are referring to a tool or an agent. I anticipate that agent-style systems that have a degree of, well, agency will become increasingly common in the short to medium term. To play this forward, I suspect that future very powerful systems will be agentic in nature (albeit with restrictions that determine the type, purpose, and character of actions that they may take).
Then there is of course self-improvement. The idea that a generalist system could recursively self-improve because designing itself is a task that any such system would necessarily be capable of is an old one. It is not all that clear to me, though, that the current paradigm on which AI is based accounts for extremely fast self-improvement because of physical limits inherent in hardware used for large training runs. For this reason, the following is based on the assumption that governance measures would be directed at managing the emergence of AGI in the near term and Artificial Super Intelligence over the longer term.
Whether AI, AGI or ASI, responsible development is about maximising benefit as well as mitigating risk. While squaring this circle is not always possible, I am proceeding on the basis that we should strive to ensure that the benefits of powerful AI are felt by as many people as possible. I’ll discuss this in more detail when I discuss international governance, but ultimately it is the idea that it is possible to balance benefits distribution and proliferation risks. I am also making the assumption that while training many new AGI-style systems in full will generally not be possible, existing systems will be deployed, accessed, and used in a multitude of different ways.
With that all said and done, there are three governance areas that I will consider in this essay. These are evaluations (attempts by developers to understand dangerous capabilities prior to and after deployment); access (how systems are released, engaged with, and used in the real world); and international governance (the bodies and mechanisms that govern powerful AI at the global level). These are just a handful of the different tools at our disposal, but I choose to focus on them because they represent attempts to govern highly capable models at the development, deployment, and organisation oversight layers of the AI value chain.
BIOGRAPHY
the following was submitted by the author.
Bio: I work on ethics and policy issues at Google DeepMind. When I’m not there, I spend my time reading and writing my way through a PhD at the Department of History and Philosophy of Science at the University of Cambridge. I’m also a postgraduate fellow at the Leverhulme Centre for the Future of Intelligence. You can find me on Twitter at @lawhsw and on Substack writing about AI history, ethics, and governance at learningfromexamples.substack.com.
🚀Read A.I. Startups | 📑 Read A.I. Papers | 🌌 Quantum Foundry | 📊 Data Science
Let’s get back to the article:
Evaluating powerful systems
Today, evaluations play a central role in assessing the risk profile of powerful models. They can be used to determine how capable a particular model is, the extent to which it produces toxic or biassed outputs, or even the amount of CO2 or water consumed during its training run. Traditionally, though, AI ‘evaluations’ have generally referred to efforts to assess the performance of models on specific tasks, pre-launch, on single-moment-in-time quantitative benchmarks.
As AI systems become more integrated into our lives, however, we are witnessing a growing emphasis on multi-dimensional assessments, which not only focus on performance but also on the ethical, societal, and environmental impact of a given system. This includes examining potential issues like toxicity, which can offer insights for content guidelines and deployment strategies. When Anthropic’s Claude 2 was released, for example, the model was assessed using benchmarks designed to test truthfulness (e.g. TruthfulQA) and bias (e.g. Bias Benchmark for QA).
Broadly speaking, evaluations or ‘evals’ can encompass both internal and external audits as well as extensive analysis of the AI's impact on users and the wider society. Such appraisals are widely viewed as a pivotal element in the responsible AI toolbox, with the success of such measures sparking calls for standardisation. Recently, the White House announced a new initiative to publically evaluate a set of large language models at the upcoming DEFCON 31 hacker convention. Eight AI firms submitted models to a red-teaming exercise that aimed to ‘break’ the models or find flaws in them.
If the first stage of evaluations focused on assessing performance on specific tasks, and the second introduced a focus on evaluating known harms, then the third stage––which we are only just entering––considers how to assess powerful models for speculated risks. All three approaches will be central to evaluating highly-effective systems, but the final type will be especially important.
This is because, as systems become more capable, they may develop dangerous capabilities that could pose extreme risks if misused or misaligned. To identify and mitigate these risks, AI developers need new tools for model evaluation that go beyond existing methods focused on biases or toxicity. Recently, researchers fleshed out what these dangerous capability evaluations might look like, presenting a framework to enable assessors to, for example, determine whether models have the capability to commit cyberattacks, manipulation, or design weapons. Related to this is the idea of alignment evaluations to check that models behave safely across diverse situations––rather than just exhibiting superficial safety. (I talked a little about the relationship between AI and deception, manipulation, and persuasion in a recent post for those interested).
In a world with extremely capable models, evaluations should inform governance processes around training, deployment, transparency, and security for all firms developing AI. Here, planning and foresight measures are key. Before training risky models, for example, developers can use evaluations on smaller prototypes to anticipate issues and take corrective action. This is a process that ought to occur during training through which concerning results prompt pauses, assessments, and design changes.
Once a powerful system has finished training, it should be subject to a substantial period of assessment before it is released. The longer the evaluation period, the more time labs have to comprehensively understand the capabilities of a given system, provide a window for third parties to red-team the model, and introduce programmes to enable limited access to the models for the purposes of testing.
And because several labs are trying to build powerful systems simultaneously, developers should share what they learn with each other to inform development and governance decisions. We already have the Frontier Model Forum, which acts as an information sharing mechanism between groups building the most capable models. More broadly, labs ought to share what they know with trusted third parties, but only in a way that does not inadvertently magnify the risk of proliferation. In practice, that might mean sharing incidents, pre-deployment assessments, and educational demonstrations with external researchers, auditors, policymakers, and the public (with each group receiving different levels of detail).
Developing a rich understanding of highly capable systems means that evaluations should continue across the full lifecycle of a given system. In reality, that means that monitoring and evaluations should continue as the model interacts in the real world once it has been deployed. Ongoing assessment can enable developers to identify potential drifts in data or changes in user behaviour that might affect the model's performance.
Access and security
The more we widen access to a given technology, the greater its potential for positive impact.
Increasing access to powerful systems will allow more people to use AI, to build with it, and to enrich their lives and the lives of others. But the same is true of harm. If you increase the number of people who can use a model, you are also increasing the number of possible vectors for harm.
One position within this debate, which focuses on maximalist interpretations of access, is often referred to as ‘open-source’ (though that label isn’t always appropriate due to issues related to transparency and licensing). Nonetheless, the term is often used as a shorthand for a style of release that sees full models (including their weights) made available to anyone who wants them.
It is possible to delineate between ‘open models’ that come with commercial-use weights and open-source datasets, and ‘open weights’ approaches providing licensed model weights but lacking public training data. Other approaches can be described as ‘restricted weights’ that have conditional accessibility with undisclosed datasets, and ‘contaminated weights’ that are technically open but are restricted by dataset limitations.
Related to the idea of open-source is the idea of transparency, which aims to foster understanding, trust, and accountability. Transparency-focused governance approaches provide clarity into the workings of a model, enabling third parties and users to discern its decision-making mechanisms. Transparency, though, is not without its costs. While transparent models can lead to informed utilisation and ethical modifications, they can also expose vulnerabilities that might be exploited.
A popular stance in this discourse, frequently termed 'transparent AI', emphasises the complete disclosure of a model's architecture, data, and training processes. Although this term is sometimes misused or confused with mere 'explainability', it ideally represents a model where both its inner logic and the data it was trained on are open for inspection. Such moves, though, also come with their own set of challenges, including risks to security, privacy, and misuse by malicious actors.
Access, then, is a continuum. That is as true for today’s models as it is for the systems of the future. On one side of the spectrum, we have some of the ‘open-source’ style approaches described above that favour access, while on the other we have interfaces and APIs. Somewhere in the middle is structured access, which tends to emphasise a controlled approach to providing access to systems or resources. The idea is essentially that those seeking to use a particular system gain access in proportion to their expertise, needs, and intended use.
Core to structured access is a ‘graduated’ approach in which the depth of access correlates with the expertise and intentions of the user. This might mean granting deeper investigative permissions to certified auditors or experts while providing a more general overview to a broader research community. Users might primarily engage with a powerful system using an interface or API rather than being able to download weights. While by no means a perfect compromise, structured access approaches generally allow significant access while preventing bad actors from being able to override important safeguards.
We can also consider licensing models as a mechanism to instil discipline and ensure responsible usage. A licensing regime would seek to ensure that users not only have the requisite understanding but also commit to a set of standards and protocols. These licences could perform a gating function, ensuring that powerful models aren’t wielded recklessly but rather employed with a sense of responsibility and adherence to specific guidelines. Just as one wouldn't hand over the keys to a powerful machine without assurance of the user's intent or competence, the use of capable AI systems could be predicated on licences at the top of the value chain.
It is also possible to restrict the type of access––as well as the mechanism by which access is granted––to a particular model. If we think of a powerful system as a library, we might expect that while some departments are open for use, others with sensitive material may require special access permissions. The public domain should ideally encapsulate portions of advanced AI that drive widespread benefit, while certain sections by virtue of their depth, complexity, or potential for misuse, might be better suited under restricted access. The core challenge here is addressing concerns that such a system would concentrate power in the hands of a few, preventing AI’s benefits from accruing to as many people as possible.
Image credit: Google DeepMind Unsplash.
International governance
So far, I have primarily focused on governance measures at the developer (evals) and deployer (access) stages of the value chain. The next example, international governance, belongs to the category that we might call oversight. Here, I am interested specifically in mechanisms for governing AGI at the international level rather than at the national or local level.
Inflection AI founder Mustafa Suleyman and Ian Bremmer of the Eurasia Group have argued that AI presents unique challenges for international policy. In a recent piece for Foreign Affairs, they write that “It [AI] does not just pose policy challenges; its hyper-evolutionary nature also makes solving those challenges progressively harder. That is the AI power paradox.”
Aside from quibbles about whether this dynamic can be described as a paradox, they make the case that AI is unique in its ability to diffuse rapidly while remaining in the hands of a few private firms. Broad applications make frontier models unpredictable, while—unlike nuclear weapons—AI systems proliferate easily and require relatively few resources to reuse and remix once created. Based on this dynamic, the pair make three central governance recommendations: a global scientific body like the IPCC, arms control-inspired approaches to prevent proliferation, and a financial stability-inspired organisation to coordinate responses when AI disruptions occur.
The author's approach belies an understanding that there is not a one-size-fits-all solution to the international governance question. That being said, one organisation can have different functions—like the IAEA, which aims to stop proliferation of nuclear weapons while also spreading peaceful applications. Creating a collection of institutions poses challenges around resource allocation (both in terms of cash and talent) and coordination (for example, a risk evaluated by a scientific body might also be of concern from a stability or arms control perspective).
The option space for international governance is vast. Researchers have recently suggested several possible models for different organisations focused on particular subsets of the broader problem. The group considers an intergovernmental Commission on Frontier AI to build consensus on AI opportunities and risks through regular assessments, as well as an Advanced AI Governance Organisation to set international safety standards for advanced systems, support their implementation, and monitor compliance. They note that the former, however, faces challenges around the lack of existing research and the politicisation of a complex, rapidly changing topic, while the latter might suffer from difficulties associated with quickly formulating standards and encouraging broad participation.
The paper also weighs a Frontier AI Collaborative to develop and distribute the benefits of AI via public-private partnerships and an AI Safety Project to scale up technical safety research by pooling talent and resources. As above, however, they note that such a Collaborative might be hamstrung by the challenge of managing proliferation risks for dual-use technologies while enabling access, whereas an AI Safety Project may divert safety talent from frontier commercial labs and struggle to secure access to the most advanced models.
Its ultimate argument, however, is that international governance is needed both to promote the global benefits of advanced AI and to manage shared risks. Spreading access to AI could greatly enhance prosperity and stability worldwide, but benefits may not reach underserved communities without international collaboration. My own view is that we could also explore an organisation with a ‘dual mandate’ to spread the benefits of AI while limiting risks. To understand why, let's return to the IAEA. The organisation holds a ‘dual mandate’ to enable the transfer of peaceful nuclear technology whilst also aiming to curtail its use for military purposes.
The IAEA seeks “to accelerate and enlarge the contribution of atomic energy to peace, health and prosperity throughout the world.” While this includes help with building and maintaining nuclear reactors, the majority of the organisation’s work is focused on initiatives like improving the quality of radiotherapy. Nonetheless, the problem is that civilian uses are inextricably linked with military uses: fuel from reactors can be used to create weapons, and know-how is easily transferred.
Originally focusing its control efforts on controlling spent fuel from reactor facilities, the organisation’s mandate expanded after the introduction of the Non-Proliferation Treaty (NPT). The NPT was agreed in 1968 and entered into force in 1970, meaning that for the first time an international organisation had the authority to routinely conduct on-the-ground inspections across the world to monitor the development of a technology prone to military use. Since then, the IAEA has been involved in assessing the proliferation of nuclear capabilities around the world, while continuing to provide assistance through the use of nuclear technology.
But can we reconcile the spread of nuclear technology with preventing proliferation of dangerous capabilities? In a 2014 quantitative study, Researchers Robert Brown and Jeffrey Kaplow concluded that states benefiting from technical assistance through the IAEA “are more likely to engage in nuclear weapons programs,” which they described as “bad news for international nonproliferation efforts.” As the nonproliferation expert Leonard Weiss commented, “Put simply, spreading nuclear technology spreads the ability (in whole or in part) to make nuclear weapons.” But the historian Elisabeth Roehrlich has offered an alternative perspective:
“Why did the IAEA, an international organization with almost global membership, defend its counterintuitive and risky mandate of sharing nuclear knowledge and technology while hoping to deter nuclear weapon programs? Because, it is argued here, what appears to be the IAEA’s greatest weakness has actually contributed to its success: While the promotional agenda of the IAEA bore risks, it also allowed the agency to facilitate diplomats and national experts coming together at the same table in pursuit of shared missions.”
By this account, the IAEA’s dual mandate not only enabled it to share the benefits of nuclear technology around the world, but it also bought it the influence and goodwill necessary for its enforcement programme. A similar model is not unthinkable for advanced AI.
Wrapping up
There’s much that I haven’t talked about, whether that’s education and advocacy or measures to increase transparency. I also didn’t really have space to get to grips with continuous evaluation, which would be central to tracking the emergence of unsafe capabilities in the wild. There’s also a lot more to say about compute governance, but there is already plenty of great work out there on that topic if it interests you.
That all said, I have given a rough shape of the governance landscape for powerful systems, introduced some of the mechanisms that we have at our disposal, and considered possible trade-offs that we might have to make. The approaches discussed here––evaluations, access, and international governance––are only part of the puzzle.
Evaluations may fail to uncover risks or be too costly, while access controls could concentrate power and stifle innovation if too restrictive. And international organisations could struggle with politicisation, coordination, and alignment of incentives. While each has its limitations, the overarching point is that successful governance requires employing robust, diverse, and complementary tools in a graduated manner that keeps pace with capabilities.
If we are to maximise the upside of highly-capable AI while minimising downside, we must learn from the management of other technologies, find compromises between security and access, and recognise the distinct nature of the challenges presented to us at each part of the value chain. Crucially, though, we must recognise that each choice represents a trade-off. As with the governance of all powerful technologies, there is no such thing as a free lunch.
Harry Law, 2019. Read his About page.
Articles in the AGI Series:
TBA (I’m looking for more guest posts in this series, details)
📢 New Newsletter Sign-up | 🦸🏻♂️ Benefactor | 👥 Guest Posts | 🌟 Give a Testimonial
Thanks for reading!