December 28, 2022

ChatGPT: training process, advantages, and limitations

Table of content

By Sergio Soage, Machine Learning Engineer at Aivo

When we started the planning process for this series of articles, the initial idea was to introduce the fundamentals of Natural Language Processing. Then, to explain the most important technologies, discuss the advances and show what we use at Aivo. However, due to recent advances in the field, we decided to (momentarily) jump to the present, specifically to ChatGPT.

Next, we explain this technology, how it evolved, and its main advantages and limitations.

GPT: A brief timeline of its upgrades

What is ChatGPT? To understand it we have to go back to the beginning of the GPT family of models. GPT stands for “Generative Pretrained Transformer”. The original paper was written in 2018, and it generated a significant change, mainly in the subfield of transfer learning. This model could be retrained with relatively little data and achieve SotA (State of the Art) results in multiple benchmarks.

GPT-2 was the next evolution, launched in 2019. This model used an architecture similar to that of its predecessor but with some updates: It was considerably larger, with 10x parameters, so retraining the model was already a complex task due to the infrastructure it requires. It also changed the data with which it was trained, that is, “WebText” (data from the Web, for example, Reddit).

These changes generated certain emerging capabilities in the model: first, the ability to generate “coherent” text, and second, the ability to do “few-shot learning”. That is, learn without retraining the model, using a few examples in tasks that the model never saw in its initial training. At the application level, GPT-2 could generate very realistic news titles, it was also adopted to generate images (feedback loop between NLP and Computer Vision). However, when it tried to be adapted to create conversational tools, it failed.

And this brings us to GPT-3, released in 2020 and with multiple improvements made until 2022 (including ChatGPT). Again, the architecture was not changed much, but the number of model parameters increased from 1.5 billion to 175 billion, and also changed the dataset. They added a lot more data from the Web such as CommonCrawl, Wikipedia, and updated WebText. Like its predecessor, GPT-3 sets new SotA in multiple benchmarks and greatly improved its zero-shot learning capabilities. Like all Foundational Models (so far), it showed biases (religious, gender, etc) and demonstrated its virtues but also its flaws. In a future article, we will focus more on this point and the technical characteristics of the model.

ChatGPT: a constant evolution

All this introduction was to talk about ChatGPT, which would become a GPT-3.5 version. The original GPT-3 model was trained with a simple goal, which is to predict the next word given a massive corpus of data. With this very simple objective, certain emergent capacities are observed, such as simple reasoning, programming, translation, and the already mentioned few-shot learning. However, this is not enough for more complex tasks: you have to “force” GPT-3 to autocomplete text by providing human feedback, that is, curating certain examples.

For GPT-3 to learn (without being retrained) it must be provided with certain samples and context via a technique called "Prompt Engineering". This technique is a BUG, not a feature since it is caused by the misalignment between this objective of predicting the next word and the final use that you want to give to the application. 

Example of misalignment between text to be predicted and final use

Chat GPT and the model on which it is based, Instruct GPT, are in charge of correcting this bug in a very sophisticated way. To fix this misalignment, humans must be involved in teaching GPT, and in this way, GPT will be able to generate better questions as it evolves.

ChatGPT training process

  1. Build a question dataset given certain prompts and do fine-tuning using supervised learning. This step is simple but very expensive.
  1. The second step is the most interesting, since different responses are proposed to the model and human feedback provides a ranking of responses from most desirable to least desirable. Using this information, a model can be trained, which is "rewarded" and captures human preferences, reducing misalignment.
  1. Step 3 is to treat GPT as a “policy” (Reinforcement Learning term) and optimize it using RL against the learned reward. An algorithm called PPO is used and, in this way, GPT is better aligned.

Advantages and limitations of ChatGPT

Now that we've covered how Chat GPT works, what does this model mean for the field? Does it mean that we have solved AGI (Artificial General Intelligence)?

For the field, it is a major evolution. However, since the data with which the model is trained is not available and we do not have code to replicate it (because, ultimately, it is a commercial product), we cannot analyze its implications in a 100% scientific way. We can make approximations, and after 3 weeks of hype and study on the results generated by the model, we can find great points in favor and great points against.

Results in favor: 

  • Produces high-quality writing. This can help produce more intelligent and compelling content
  • It supports more complex instructions, so you can perform more advanced “reasoning” tasks
  • It can generate contents of greater context (length)
  • You can scale many more tasks

Results against:

  • Lack of information about the dataset with which it is trained to understand the biases it may have
  • It keeps "hallucinating"
  • Tends to write plausible but incorrect content with confidence
  • It can only be used via an OpenAI endpoint, so one is a "slave" of the product
  • Extremely expensive model (OpenAI cuts costs to gain users)

Some examples of the implications of producing erroneous outputs:


The great Foundational Models like GPT imply an exponential advance in the field, but all these advances must be analyzed deeply.

This analysis allows us to identify where they fail so we can improve them in future iterations. At the rate with which we are observing these advances, there is no doubt that a very promising future awaits us, but also that it will require a lot of responsibility from all those involved in the field.

Are you looking for new ways to improve your CX?

Our customer service solutions powered by conversational AI can help you deliver an efficient, 24/7 experience  to your customers. Get in touch with one of our specialists to further discuss how they can help your business.