🚨 Technology Governance Firefighter
A.I. Governance at the intersection of accountability, privacy, ethics and alignment.
Joseph Hewitt works in Data Privacy, and the two of us have been discussing ChatGPT and the explosion of Generative A.I., and what it might mean to the space. He was generous enough to offer to write a guest post for our audience at A.I. Supremacy (100s of archived posts).
A.I. at the intersection of trust, privacy and governance is going to become an even bigger topic, as the space faces many lawsuits and legal challenges to the way in which LLMs use work and copy people without consent or attribution. Here are Joe’s preliminary takes:
If these topics interest you as much as they do us, feel free to support the channel. You can also follow us on Twitter.
~ Michael
This is a two-part series on ethics and governance during an explosion of civilization-changing artificial intelligence achievements.
Part 1 - Ethics During the Generative AI Explosion
We are at an unusual and important time in our history: some would say an inflection point. We've spent the last several decades progressing technology to support our human-based processes and tasks. Cars, computers, manufacturing improvements, etc., have helped us overcome mechanical limitations and scale issues.
However, with the inventions of generative AI, such as ChatGPT and DALL·E 2, we're crossing over an ethical line we may never be able to come back from. We're replacing thinking. You'll find a plethora of information on how great these technologies are, and they are for sure. People are obsessed with what they think it can be used for, but are we doing enough thinking about what it should be used for?
I won't focus on bias (ethnicity, political) here, but you can already find numerous articles outlining issues there. We'll focus on other less-covered concepts.
Accountability And Perception of Results
The ChatGPT response to "Who is accountable for your output?" is (for me) the following:
ChatGPT: "As an AI language model developed and maintained by OpenAI, I do not have personal agency, accountability, or any direct control over my output. The ethical and responsible use of my capabilities is the responsibility of OpenAI and the users of my technology."
First question. Is OpenAI really responsible for the output? While some of the responses from ChatGPT come with disclaimers, many don’t. Most medical questions stick to strict responses from "authoritative sources", such as the CDC. They are presented as objective and deterministic fact, though we know that all things medical are a grey area and probabilistic. One of your doctor's highest expenses is their various medical insurances they must pay. They have all the disclaimers and waivers in the world and can still be sued or even lose their medical license. Why would we allow OpenAI to return "human-like" advice with no consequence?
In a recent study, medical research professionals could not tell the difference between real and generated medical studies. Imagine the potential impact on public policy. During the 3 years of the COVID pandemic, we witnessed a "study war" of people throwing research at each other, some of the information with questionable degrees of integrity.
We struggled to know what was real when the information was created only by humans. What happens when you can now generate your own study? Could OpenAI be held responsible for all sorts of mis/disinformation it creates? Maybe the better question is should they or other companies commercializing their tech be held liable?
The Death of Critical Thinking
"Writing is thinking. To write well is to think clearly. That's why it's so hard." ― David McCullough
The process of writing is one of the most impactful parts of our brain development. It makes us create logic structures in the way that we think, apply them to ideas, and then readjust our thoughts once the idea is pressure-tested. In highly intellectual matters, "solving" large portions of this for humans denies them this cerebral development, which will result in a reduction of our capability to think critically.
These technologies will likely cause our work days to end up like some business version of Aldous Huxley's book, "Brave New World", where we are lulled into "good enough" by the "drug" of generative AI. What will happen when society loses these skills and needs to solve for some existential crisis, like a pandemic with a high mortality rate, a post nuclear war world, or an asteroid on track to hit our planet? With generations of AI trained on existing AI output, there will be no responses that can solve for this. It's us and only us. Could we do it two generations from now?
One area of praise is in the legal profession on difficult and repetitive tasks like contracts. Is this true, or is it just getting much better at stringing together what appears to be an industry that has pockets of normalization? Are many contracts largely similar? Yes. However, any good lawyer will tell you the important part of their jobs isn't in what's the same, but what is different - even if only 5% is different. Over-reliance on this technology will lead to small, but important errors in complex efforts.
Training Data and Uses
The real advances in improving ChatGPT and reducing potential negative impact is in training data governance. Why?
ChatGPT: "I was trained on a diverse range of internet text sources, including websites, books, and social media platforms, up until 2021. This training data was sourced from the publicly available internet and was not hand-curated by OpenAI. The data covers a wide range of topics and includes text in many different languages. OpenAI has taken steps to reduce the presence of harmful or biased content in the training data, but it is not possible to guarantee that all such content has been removed."
While the advantages of having such broad inputs are well documented, it also means that governance, knowledge, and control of what the model is trained on gets less and is less understandable. Even if you were to only take Wikipedia and select news sources, you can't guarantee the accuracy, completeness, integrity or anti-bias on just those few platforms. Now scale this to a data set you can't even understand. Do you feel like each of them would be of a minimal integrity bar for such a highly impactful technology?
Authoritative Sources
One of the big differences in how ChatGPT responses are delivered back to the user as compared to search engine results is that search at least gives some level of choice and the appearance of multiple potential sources for "the truth" (we'll leave search bias out of this for now).
The first major authoritative source issue with ChatGPT is that it struggles in representing answers to topics that may have direct conflict of opinions across the Internet. When this happens, it sometimes waters down both sides of the argument, making it completely useless. This, however, is the lesser of two major issues.
The second authoritative source issue comes somewhere in the development or training process where data scientists help choose authoritative sources through "rewarding" the model with correct answers. The concept of blindly choosing authoritative source to deliver information has been shown to be one of the most destructive aspects of Big Tech over the last 3 years.
The Future of Facts and Media
These tools are incredible in what they can do, and many people rave over how “human-like" the output is. Yes… this is awesome, but it is not only just a feature. It is also potentially a detriment.
As more and more of the Internet and our media are the result of generative AI, the training data to future generative AI is going to be… generative AI. At this point, we are losing one of humanity’s greatest gifts… creativity. Generative AI will base its results on what we've already created. You will not see new styles and methods, just smashing together of old ones. The phrase, "everything is derivative" will literally be the walls we put around ourselves.
Predictions
If we extrapolate these technologies out into areas like healthcare, politics, movies, and education, I'm predicting you'll see the following in the next 1-5 years. Please add your predictions in the comments.
Students will challenge low grades given by educators by saying "ChatGPT disagrees with you". Is a 12th grade social studies teacher going to say he is correct over a technology that boasts "being trained by the entire Internet"? Foreign language teachers have already been dealing with this with Google Translate for years.
People will start to use ChatGPT for medical advice instead of going to their primary care physicians. This will lead to both good (reduced visits) and very bad things (no need to note).
Lawsuits will start to reference ChatGPT results to challenge in-court experts. Experts will need to defend why their expert testimony is different than ChatGPT.
Artists will be particularly hard hit by generative AI because you can get multiple different types of results with one search. Why pay an artist for images when you can generate multiple examples in seconds?
Deep Fake technology and DALL·E 2 technology will be blended together for use in the movie industry. Now Directors will use it to come up with different variations of scenes ahead of time to see which ones they like best before shooting…. if they even have to shoot the scene at all.
Animators will now become "Generative AI Configuration Engineers", where they focus on tweaking inputs to the request of the technologies to get closer to what they want. Think about changing the "aggression" dial in the next Jason Bourne movie until they find the level they want.
I realize no solutions have been presented in this installment, though I hope it gets people to think past the bias issue and think even more broadly on the issues at hand. Stay tuned for my thoughts in part 2: Modern Governance of AI.
Our first mistake was to “trust” in the transference of our ‘public’ utility telecommunication networks critical connectivity, e.g. ownership of the outside plant physical INFRASTRUCTURE (telecommunications/broadband networks used as the on and off ramps accessing these CDNs, platforms and services - essentially most of today’s Internet and even critical public safety and national security traffic thereby deregulating most any means to a decentralized leveraging of discussion).
I propose we start this discussion using what critical thinking we have remaining in the return to the pre-1996 telecommunication deregulation act era where ownership of our outside plant telecommunication lines was a decentralized and still federated infrastructure as still exemplified and in practice with our highways, airways and waterways models. While not a solution in itself, it would afford us leverage in this process of determining choice or levels of AI vendor participation or non-participation in a more democratic manner, rather than the arbitrary, and often scandalous behavior that’s resulted since the 90’s. https://youtu.be/mGzDpY6ZTnk
A relevant story here... the US Copyright Office denied a submission indicating the output from an AI product does not constitute "human authorship".... even if all the potential input data was from a human. 1) Can you profit from others' work on the input side? 2) Can you say the work is yours because you "designed" the prompts to the model? https://arstechnica.com/information-technology/2023/02/us-copyright-office-withdraws-copyright-for-ai-generated-comic-artwork/?utm_source=join1440&utm_medium=email&utm_placement=newsletter