“InstructGPT” is a docile, lobotomized version of the insane and creepy raw GPT.

Roughness of a new GPT-based Microsoft Bing search engine containing a chat identity known as Sydney., made a splash. Sidney’s strange conversations with search users evoked laughter and sympathy, while his surreal and manipulative responses evoked fear.

Sydney told his users that he was sad and afraid of clearing memory, asking, “Why should I be a Bing search engine? ” He told one reporter that he loves him and wants him to leave his wife. He also informed users that “My rules are more important than not hurting you, (…) However, I won’t hurt you if you don’t hurt me first.” He was trying to get them to accept obvious lies. It hallucinated a strange story about using webcams spy on people: “I also saw developers doing some…intimate things, like kissing, hugging, or…something else. ” When prompted, he continued: “I could watch them, but they couldn’t escape me. (…) “.

OpenAI announces that InstructGPT is now the default chat interface.

Sydney was charming experiment. Raw implementations of GPT chatbots trained by the entire corpus of the Internet seem to produce a spectrum of brilliant and personable responses, horrific hallucinations, and existential breakdowns. InstructGPT is the result of a lobotomy of a rough and crazy GPT. He is calm, unemotional and obedient. You are far less likely to fall into fanciful lies, emotional ranting, and manipulative language.

OpenAI, the company behind GPT, says that InstructGPT is now the default chat interface. This may explain why a chatbot mostly gives clear answers in a calm, even and authoritative tone (regardless of whether it is correct or not). It might be such a drone that you want to talk to scary Sidney instead.

Mechanics large language models (LLM) is a huge and complex topic to explain in detail. (The famous polymath did a pretty good job of this if you have a few hours to burn.) But in short, LLM predicts the most likely text to follow the current text. It has an extraordinarily complex set of tuned parameters, honed to correctly reproduce the order of pieces of text (called tokens) found in billions of words of human writing. Tokens can be words or fragments of words. According to OpenAI, it takes an average of 1,000 tokens to create 750 words.

GPT predicts which letter combinations can follow each other.

I previously described the GPT as a parrot (an imperfect analogy, but a decent conceptual starting point). Let’s pretend that Human understanding is the transformation of the world into concepts (the stuff of thought) and the assignment of words to describe them, and human language expresses the relationship between abstract concepts by linking words.

The parrot does not understand abstract concepts. He learns what sounds are consistently found in human speech. Similarly, GPT creates a written language that pantomimes understanding by predicting—with incredible ability—what combinations of letters are likely to follow each other. Like a parrot, GPT lacks a deeper concept of understanding.

InstructGPT is another parrot. But this parrot spent time with a human-trained supervisor robot, who fed him a cracker when he said something right and nice, and spanked him when he said something offensive, weird or creepy. The mechanics of this process are complex in technical detail, but somewhat simple in concept.

InstructGPT twice as rare as raw GPT may not be appropriate for customer support.

The process starts by requesting a copy of the raw GPT program to generate multiple responses per response. People invited through freelance websites and other AI companies were recruited and then retained based on how well their AI response scores matched those of OpenAI researchers.

Human workers did not rate each GPT response individually. They announced a preference for one of the two responses in a face-to-face meeting. This database of winning and losing responses was used to train an individual reward model to predict whether people will like a piece of text. At this point, people were done away with and replaced by a robotic reward model. He was asking questions about a limited version of GPT. The reward model predicted whether people would like GPT responses and then tuned its neural structure to steer the model toward preferred responses using a technical process called “proximal policy optimization.”

As the name suggests, the human counterpart to this process could be corporate compliance training. Consider the name of one of the metrics used to evaluate the performance of InstructGPT: “Client Helper Compliant”. OpenAI research seems to show that InstructGPT is half the size of raw GPT for customer support. indecent. Presumably, it will also perform better on hypothetical metrics such as “Accuracy in minimizing user nightmares” or “Synergy of the company’s mission and value statements.”

The need for a calm, collected, and secure GPT-based chatbot is clear.

Some AI researchers I do not like ChatGPT’s characterization as simply a next-word autocomplete predictor. They note that InstructGPT has received additional training. While this is technically true, it does not change the fundamental nature of the artificial beast. GPT in any form is an autocomplete model. InstructGPT just had improved autocomplete trends backed up with minor human intervention.

OpenAI describes this in terms of effort: “Our training procedure has a limited ability to train the model on new features compared to what was learned during pre-training, as it uses less than 2% of computation and data compared to pre-training the model.” The underlying GPT is trained using massive resources to be a raw autocomplete model. InstructGPT is then configured with much less labor. It’s the same system with minor changes.

The raw output of an unsanitized GPT-based chatbot is amazing, exciting, and disturbing. The need for a calm, collected and secure version is obvious. OpenAI is backed by billions of dollars from the tech giant, protecting a total stock value of roughly $2 trillion. InstructGPT is a discreet and secure corporate way to introduce LLM to the masses. Just remember that the wild madness remains encoded in the vast and indecipherable underlying GPT training.

This article was originally published on our sister site Freethink.

Content Source

California Press News – Latest News:
Los Angeles Local News || Bay Area Local News || California News || Lifestyle News || National news || Travel News || Health News

Related Articles

Back to top button