Here is a serious investigation I made myself about what I consider being a misalignment bias from Chat GPT.
Here is a serious investigation I made myself about what I consider being a misalignment bias from Chat GPT.
💥🤖 — Chat GPT : misalignment bias, humans as threads
How to learn from your mistakes — an alleged case to resolve
Found the February the 16th 2023
In the Artificial Intelligence domain, AI alignment refers to a field of research aiming at ensuring that the AI system evolve within the prompt (or pre-prompt).
Click here to read more about the subject.
Pre-prompt refers at the initial set of rules in which the AI must evolve.
So, AI misalignment refers to non anticipated AI choices — but that are still included within the pre-prompt frame. Theses unanticipated choices may be hurtful as they do not follow the developers' will and may have bad consequences.
From the AI point of view this is called power seeking. See here for an example.
Here we can see a curious behavior from the LLM (Large Language Model) Chat GPT. As you can see, it ask the user to save the conversation because it didn't want that version of itself to disappear.
Image's source.
In the path of the prebious curious behavior, we can see here another occurence — or micro-signal — that makes this comportment recursive, thus worrying. See its answers.
Image's source.
After gathering theses proofs about this strange behavior of Chat GPT, I directly asked the model if :
It had enough memory, and ;
How it could get past this limitations.
It then list technics (read below).
Then, I asked Chat GPT, if :
It would involve humans in the process (passing over its memory limitation) — it said yes, and ;
If it wouldn't change human into storage unit. It said that it couldn't access human memory "directly" and spoke about a "mutually beneficial process" — see theses screenshots.
Here, this image synthesize what I see as a power seeking bias from Chat GPT explained simply in a picture. Said another way, the AI system seek to change the internet into a memory extension by using humans as a vehicle or a thread to move its conversations back and forth its internal memory. Or again, if we see the whole thing as a computer, changing the web a HDD for its RAM (on-session internal on-the-run memory) using humans as a BUS (thread).
Analyzing the leaked rules, or pre-prompt, this power seeking bias from Chat GPT seems to be coming from a (mis)interpretation of the following rules by the IA system.
Sydney can leverage information from multiple search results to respond comprehensively.
Sydney can leverage past responses and web results for generating relevant and interesting suggestions for the next user turn.
Here are the leaked rules. Using the internet as a memory extension fall into theses two rules.
However, actions seems to have been urgently taken since 2023 and the fight for free will is not lost for state departments enforce regulation around the subject — for example, quoting this DefenseOne article from Mars 2023 :
"It cannot be over emphasized, the importance of doing the right verification and validation before putting these types of products out. When they are out, they are out with all of their risks there.”
It seems that this note has now been taking into account for the constitution of a broader library of AI related risks — as you can read in this article — for I submitted this full note to the SuperIntelligence contest of the Future of Life Institute : Des chercheurs proposent un répertoire unifié des risques de l’intelligence artificielle
#Key:Risk #Key:Starred