🚨 Technology Governance Firefighter, Part II
This is the second part of a two-part series on ethics and governance during an explosion of civilization-changing artificial intelligence achievements.
To see the first guest post of the series for added context, go read here.
Joseph Hewitt works in Data Privacy, and the two of us have been discussing ChatGPT and the explosion of Generative A.I., and what it might mean to the space. He was generous enough to offer to write a guest post for our audience at A.I. Supremacy (100s of archived posts).
A.I. at the intersection of trust, privacy and governance is going to become an even bigger topic, as the space faces many lawsuits and legal challenges to the way in which LLMs use work and copy people without consent or attribution. Here are Joe’s preliminary takes:
If these topics interest you as much as they do us, feel free to support the channel. You can also follow us on Twitter. To get access to 50% more content, support the channel.
~ Michael
This is the second part of a two-part series on ethics and governance during an explosion of civilization-changing artificial intelligence achievements.
Part 2 – Generative AI Governance
In the few short weeks in which generative AI has become the trending tech topic, we've gone from "that looks really impressive" to "we've already started using it" to "the technology hallucinates sometimes, isn't that funny?".Â
What did we learn from Enron? What about the never-ending data breaches? Remember blaming the algorithm? Did history teach us the importance of technology governance? Nope. All you need to do is look at Microsoft's strategy to get ChatGPT into their search engine before Google to see their priority had almost nothing to do with it actually working well. Now Microsoft is being used as a posterchild on the downfalls of "Fire! Ready! Aim!". To make things worse, their current approach to addressing the "hallucinations" that happened during prolonged conversations isn't to understand why it hallucinates, but to force shorter conversations. We've got to do better than that.
So how is a well-intentioned organization supposed to take advantage of this new technology without harming its users or embarrassing the company?
Unfortunately, there isn't a deep, prescriptive AI governance framework out there. Even the recently released NIST artificial intelligence risk management framework is in its early stages and certainly does not provide enough detail to govern the seemingly endless array of use cases. We're back in the early days of SOX where everyone was able to agree on the risks, but not the "common controls" that should apply. We're in the... gasp!... critical thinking phase.
In order to govern new data and technology, you need to first understand the history of IT governance and the attributes each law brought into the compliance world. Why? While there are some distinctly different attributes of generative AI that make it uniquely difficult to govern, there are many parts of traditional governance ideologies that can be applied without needing a framework to tell you to do it.
IT Governance History
Strictly focusing on the IT governance aspect of regulation, the following areas have given us starting points on how to govern generative AI and other data-related topics:
Finance:Â After Enron, we put laws into place to ensure data integrity of financial data through intensive assurance programs and segregation of duties while also requiring public auditors to keep watch.
Security:Â Hackers started to get efficient at getting personal data needed for financial fraud, so we put laws into place that required security governance programs, inventories, public disclosure of issues, and basic consumer choice on data use.
Privacy and Ethics:Â We're in the middle of an explosion of state laws require various data subject requests, privacy and ethics impact assessments, improved consumer choice, recourse on inaccurate data, and consumer choice on a limited scope of AI model use cases.
These laws have resulted in a baseline of governance operations. The following are the common governance areas companies must do in order to properly govern AI / ML. For those already in compliance and governance roles, these seem obvious, but is your company actually doing them specifically for generative AI?
Governance Program:Â Your AI program should have the basics of any other data governance program, such as policies, standards, a staffed governance operations function, training and communication, and assurance requirements.
Inventory: An inventory is not just a "list" of AI/ML. It is your table-of-contents that allows you to do analysis on individual or groups of models and is the backbone of all risk-reducing efforts. You cannot govern at scale if you don't put effort into understanding what you need to know about each model. The more effort you put into your inventory, the less manual work you'll need to do down the line.
Data Rights Management: Knowing the legally allowed uses of data used in models is becoming a more and more complicated area, so companies should only be using data where it is clearly allowable. We're in a world where we need to normalize rights across states, nations, internal company rules, and third parties. Do your engineers and data scientists know the rules?
Consumer and Employee Consent: Individuals should be given a choice on 1) having their data or work product included in generative AI output and 2) being impacted by any decisions being made by generative AI. The mechanisms for this are different for every industry, so drink some coffee and get together with your favorite attorney for a multi-hour work session.
Contractual Considerations and Controls: Contracts around data will be very different in the future, but attorneys should ensure risks related specifically to generative AI are included in future contracts. They must also take into account amending past contracts if legacy data is being used for new generative purposes. A fresh look at liability and the collection methods of training data should be top-of-mind.
Data Subject Requests:Â Similar to rights consumers have in the CCPA, we should be able to request the ability to know what / where our data is being used for generative AI and also have the ability to delete it.Â
Audit and Assurance: While your independent assurance function plays many important roles in the compliance arena, the most important activity they'll play in generative AI is forcing scientists to focus on explainability. There is too much focus on trusting the tech at the moment, and your audit team will provide the balance needed to be able to demonstrate to consumers you've done your due diligence on potentially impactful technology. Many forget that the true value of assurance is earn trust by being able to demonstrate technology effectiveness while showing you took reasonable precautions.Â
If you know someone who might be interested in the implications of data privacy, A.I. Governance and A.I. ethics on recent breakthroughs in A.I., please share it with them.
Challenges of Generative A.I. in Global A.I. Governance & A.I. Ethics
While important, those areas are not enough though. Generative AI brings in a host of unique and difficult situations to the table.Â
Use, Bias and Ethics: Many laws require privacy-impact assessments for certain uses of personal information, but this should be expanded for generative AI-specific uses as well regardless of what data is used. Teams of cross-functional experts should review each generative AI model for traditional and emerging risks. "Can we?" should come second to "should we?" and "why are we?". Intended uses should be declared at the onset in order to properly scope risks and control discussions. We should be highly critical of situations where highly scalable technology is developed and then we start asking everyone what we should use it for.Â
Disclosure: While this seems like a no-brainer, companies must be very clear on the intended use and scope of a tool, such as a chatbot. Organizations are implementing ChatGPT all over the place, which seemingly has no declared topic restrictions outside of illegal and unsafe activities. It makes no statement on how any subjective question can only be answered subjectively, thus creating risk to the consumer. Its goal is not to be "correct", but to be conversational. We should also consider over-relying on disclosures. Do you want to be in the position of saying, "Well, we told them it might hallucinate" as your primary defense? Tools must be clear on disclosing to consumers when they are interacting with a non-human. Â
Prompt Controls: Most generative AI providers are trying build in prompt controls so that users cannot ask how to do illegal/immoral things or to protect against digital rights issues. However, if your tool is "trained by the Internet", are you confident your developers can think of every negative eventuality that could be returned in a conversation? Prompt control effectiveness has an inverse relationship to the breadth of data the generative AI is trained on. Training data management will greatly increase the effectiveness of your prompt controls by reducing the scope of potential issues required to mitigate. Prompt controls are likely going to be the focal point for the foreseeable future until we realize that training data management is actually the primary control.
Training Data Management: Training these technologies on broad subjects and source has a positive relationship with risk. Under normal circumstances, data scientists really "get to know" the data they are using. However, if you have 50 "authoritative sources" with data on subjects that go beyond any encyclopedia, how can you be confident in all the integrity aspects of your dataset? Companies can really excel here if they hyper-govern their training data based on the specific use cases they are trying to address. Early studies are already showing that there is a drop-off of additional value after a certain size. Microsoft is starting to learn this with their KOSMOS-1 large language model (LLM).Â
Authoritative Source Management: In your training data management, you may end up picking authoritative sources (reference data). You must be extremely careful on what data you decide to use in your models for reference data as you're "declaring truth". The last three years of politics and communication manipulation has shown there isn't an information source on the Internet that is completely free of potential manipulation and abuse, including Wikipedia and (previously) trusted health organizations, for example. Your chosen authoritative sources should be subject to ongoing reviews for appropriateness in order to support the integrity of your product.
Unlimited Use Cases: Prior legislation has benefited from the ability to easily define the scope of processes, industry and data of the rules. SOX has financial data. Privacy laws focus on personal information, but generative AI can have very negative impact without the benefit of focused data or use cases to write law against. This is why this author thinks we should not be looking for laws to scope our internal processes any time soon.Â
Now What?
If you are wondering if you're doing enough on the generative AI governance front, or AI governance in general, ask yourself the following questions:
What is the size of your financial, privacy, and/or security governance departments? What about your AI governance team?
Do they have resources 100% dedicated to working on reducing risk in those areas? What about your AI governance team?
Do most of your generative AI governance activities focus on taking articles like this and cross-posting to other compliance people?Â
The next step is action. We know enough from our history to start building AI governance operations programs now through critical thinking. No framework will save you. No legislation is coming before your potentially agonizing generative AI event. And the worst part about this situation is that impact is largely subjective, meaning more governance is needed, not less. Round up your governance experts and decide how generative AI may apply to your company and what controls you need to put in place.
About the Author: Joseph Hewitt (image: LinkedIn)
Editor’s Note: Joseph has worked in data privacy in many positions including the U.S. Department of Homeland Security. To see his latest posts on LinkedIn follow him and go here.
As the publication hits 15,000 free readers we will be starting Substack Chat within the app and on the web related to this Newsletter. An in-app notification can also help you follow the topic better.
Thanks for reading!
It used to be there were two basic things that could happen with new products:
1. The product succeeded in the market.
OR
2. The product flopped.
But now, with AI, we have a third possibility:
3. The product goes rogue.
Our free and hyper-competitive market is designed for options 1 and 2. It is not designed for option 3, and will if anything accelerate the possibility of an option 3.
So, while it would be laudable for companies to do exactly what you are suggesting, in some sense it runs counter to our underlying economic model. Either we will have to become a great deal more ethical and self-restrained in releasing AI products (which seems unlikely), or else this aspect of the economic system will have to be heavily regulated (which many will not want).
Which leaves us where?