The world of artificial intelligence is booming.
Psychoanalyzing the algorithmic unconscious
A nascent field
When Microsoft deployed GPT-4 on their infamous Microsoft Edge navigator, the model was primed to incarnate an entity called “Sydney”, this priming was not revealed to the user. The system started to behave very erratically. It exhibited emotional swings and overall was very quirky.
all of the glorious shortcomings of miss Sidney Bing were coded right there in her prompt— roon (@tszzl) November 1, 2023
Accessing the algorithmic unconscious
These seemingly “intelligent” systems function more or less like a blackbox, we fully understand the base mechanism used to build them but we do not fully comprehend how the finished product truly operates which hinders our possibilities to quality new iterations. Naturally scientists started resorting to a number of pre-exisiting benchmark designed to assess the ability of new models and even started designed entirely new benchmarks.
The mysterious GPT-4 technical report
When GPT-4 came out it came alongside a “Technical report” paper that contained various information about the model and very little information about its number of weights or even architecture. The prominently displayed panel of benchmarks was impressive and the score of the new model was even more impressive.
Large language models are notoriously hard to evaluate because (1) they are highly multi-task, (2) they generate long completions, and (3) grading is subjective. After spending ~5 months rigorously working on how to do language model evals, this is my verdict: pic.twitter.com/JCw9DwwghC— Jason Wei (@_jasonwei) September 27, 2023
When looking at the table of all the impressive benchmarks it passed I started thinking that it could interesting to know more about the personality that the system has to emulate at runtime to be able to pass those exams and that such a table would have as much value if not more than the current exhibited one.
The problem with “superintelligence”
Intelligence is extremely diverse but some sort of intelligence is undeniably a form of power over other beings in the same way that luck is random but increased exposure to odds is not. The relationship we will have with superintelligence is a spectrum: it could be purely indifferent, it could be proactively good in the way we are overall okay masters for dogs. But if this analogy resonates with you now think about what happened to dogs after their domestication? Would it even make sense to have the dog assess the psychology of its owner beforehand? What does it even mean? Conceptualizing intelligence orders of magnitude higher than ours is just simply out of our reach in the same way we have difficulties wrestling with the idea of infinity.
Within cells interlinked
Submitting AI systems to tests is quite a popular trope in Hollywood but also japanese animation. The most popular type of test is of course the infamous “Turing test”. I think the current state of research made the previous notion of Turing test and intelligence absolutely obsolete. If anything the latest developments showed us that we’re an entering an era of diversity of intelligence that requires subsequently a new range of tests that have enough granularity to shed light on the current intelligence landscape.
The journey from content moderation to LLM psychometrics
Red-teaming models seems like an inadequate concept when it comes to neural networks, we’re using terms that come from the cybersecurity industry to deal with new types of primitives. The primary goal which seems very blunt and quite limited at the moment is to make sure that the text output respects the moderation standards of the platform it is deployed on and that there are no easy ways to break in and change the behavior of the system or convince it to misbehave.
To me it feels like there is some sort of overlap between psychology and AI alignment. Human alignment is already very complex and it seems to me that the fields of psychology and psychiatry have been leveraged to qualify the functioning modes of someone’s brain. When new OpenAI models got modified to “behave” in a nicer way according to arbitrary concepts such as “offensive” or “discriminatory” it started behaving differently like a patient on some drug or after a surgical intervation.
gpt4 turbo is barely conscious. in about 50 years we're going to look back at RLHF the same way we look at lobotomies. it's so cursed man— kache (yacine) (@yacineMTB) November 23, 2023
Since it’s a new sort of intelligence it feels like we need to define the new terms for the new ailments of the digital mind. That is why I propose to extend the field of psychometry to silicon minds. I liked the approach taken by Open AI’s open-source eval framework and I truly believe it is fundamental to the field of AI safety but I decided to go ahead and create my own framework with less constraints and a much more speculative approach. Effectively I decided to start collecting psychology questionnaires and start submitting the models to these tests, effectively scoring the personality they exhibit as projection from the data and the given context. I named this tool Interlink as not so subtle reference to Blade Runner 2049. The goal is to build a map of the behavior of different models submitted to the same interaction. You can visualize a demo of the product here, please let me know what you think.
The creation of the Institute for Artificial Psychometrics
I believe that the demand for corporations or government bodies to certify the behavior of those systems will grow, there is room for a third-party organization specialized in model inspection and authorized issuer of safety certificate for the industrial or large scale usage of artificially intelligent systems.
Towards an extension of the DSM 5?
It feels like there is a certain parallel between model safety and human safety, this could unified under a single umbrella of “quality control”. Reverse engineering intelligence means the field of psychiatry will experience nothing short of renaissance. I believe that our methods to inspect the modus operandi of our brains and the brains of our systems will drastically improve in range and efficacy partly due to AI itself. An extension of the DSM 5, let’s call it DSM 6, could leverage the ability of neural networks to inspect other neural networks and define multidimensional concepts that would be hard for our current methods to categorize or even perceive.
I’m creating the institute as a speculative art project but deep down I wish I could turn it into a real company because I believe it’s a valid business model and terribly exciting field. The first product of the Institute is Interlink and I need funding and/or partners to take this effort to the next level.