When learning another language, the easiest way to start is to fill in the blank exercises. “It’s raining cats and…”
By making mistakes and correcting them, your brain (which linguists agree is hardwired for language learning) begins to discover patterns in grammar, vocabulary, and word sequence – which not only can be applied to fill in the blanks, but also to convey meaning to other humans. (or computers, dogs, etc.).
This last element is important when talking about the so-called “base models”, one of the hottest (but underrated) topics in artificial intelligence right now.
According to a 2021 review article, base models are “trained on large data (usually using large-scale self-supervision) that can be adapted to a wide range of downstream tasks.”
In non-academic language, much like studying fill-in-the-blank exercises, basic models learn things in a way that can then be applied to other tasks, making them more flexible than AI models. current.
Why are the foundation models different?
The way basic models are trained solves one of the biggest bottlenecks in AI: data labeling.
When (to prove you’re not a robot) a website asks you to select “all images containing a boat”, you’re essentially tagging. This tag can then be used to provide images of boats to an algorithm so that it can, at some point, reliably recognize boats on its own. This is traditionally how AI models are trained; using human-tagged data. It is a time-consuming process that requires many people to label the data.
Foundation models do not need this type of labeling. Instead of relying on human annotation, they use the blank-filling method and self-generated feedback to continuously learn and improve performance, without the need for human supervision.
This makes basic models more accessible for industries that do not already have a wide range of data available. In fact, according to Dakshi Agrawal, IBM Fellow and CTO at IBM AI, depending on what area you’re training a base model in, a few gigabytes of data might be enough.
These complex patterns may seem like a long way off to a user like you, but you’ve almost certainly seen a basic pattern in action at some point online. Some of the most famous are the GPT-3 language model, which, after being fed by works of famous writers, can produce remarkable imitations, or DALL-E, which produces stunning images based on user prompts.
But foundation patterns are not limited to human language.
Beyond creating new entertainment, the flexibility provided by basic models could help accelerate groundbreaking medical research, scientific advances, engineering, architecture, and even programming.
Emergent properties
Foundation models are characterized by two very interesting properties: emergence and homogenization.
Emergence means unexpected new properties that models show that were not available in previous generations. This usually happens when the size of the models increases. A language model performing basic arithmetic reasoning is an example of an emergent property of a model that is somewhat unexpected.
Homogenization is a complicated term for a model trained to understand and use the English language to perform different tasks. This can include summarizing a text, producing a poem in the style of a famous writer, or interpreting a command given by a human (the GPT-3 language model is a good example). .
But foundation patterns are not limited to human language. Essentially, what we teach a computer to do is find patterns in processes or phenomena that it can then reproduce under certain conditions.
Let’s unpack this with an example. Take molecules. Physics and chemistry dictate that molecules can only exist in certain configurations. The next step would be to define a use for molecules, such as drugs. A basic model can then be trained, using tons of medical data, to understand how different molecules (i.e. drugs) interact with the human body when treating disease.
Of course, models like these can also generate controversy.
This understanding can then be used to “refine” the basic model so that it can make suggestions as to which molecule might work in a certain situation. This can dramatically speed up medical research, allowing professionals to simply ask the model to come up with molecules that might have certain antibacterial properties or might work as a drug against a certain virus.
However, as mentioned, sometimes this can produce unexpected results. Recently, a group of scientists using a basic model of AI to discover cures for rare diseases discovered that the same model could also be used to discover the most powerful chemical weapons known to mankind.
Core Concerns
A small indication of the sea change these templates can bring has been the proliferation of companies offering “prompt generators”, which use humans to come up with prompts for templates like Midjourney or DALL-E that reliably produce interesting or precise images.
Of course, models like these are controversial. Lately, a number of artists have spoken out against using their works to train image-generating models.
There is also a case to be made about the power consumption required to train a large-scale model. Add to that the fact that the significant computing resources required to create a basic model means that only the biggest tech companies in the world can afford to train them.
Again, as Agrawal explained, increasing the efficiency of training and using these models means that they become more accessible to more people at an ever faster rate, which reduces to both energy consumption and costs.
Another more fundamental problem (sorry) with these models is that any bias or error in the original model can be transferred to tools built with them. So if racist language is used as training data for a language model, it can lead to offensive output and even legal action against the company in question.
One way to avoid this is to manually discard unwanted training data, but another, more futuristic method is to use so-called synthetic data. Synthetic data is essential fake data that is generated by an AI model to mimic reality, but in a more controlled way. This can be useful to ensure that a base model does not accept any offensive or privacy-sensitive data during the training process.
Will more advanced AI models take our jobs?
Well, yes and no.
Most AI researchers see these models as a tool. Just as an electric screwdriver meant fewer hours were needed to assemble a wooden structure, a person was still needed to wield the electric screwdriver.
Take IBM’s base model, Ansible Wisdom. In a quest to find out if computers can learn to program computers, researchers refined a model to generate snippets of Ansible code that previously had to be written manually. With it, developers can use natural language to ask the model, for example, to suggest ansible automation to deploy a new web server.
Agrawal thinks this will completely revolutionize the work of programmers.
The entire innovation cycle will accelerate thanks to AI. For example, if you look at code, using base templates, coding becomes much faster using the first generation of base templates. I’m sure it will double the productivity in just a few years.
The company releases the model as an open-source project in collaboration with Red Hat, most famous for distributing and maintaining the open-source operating system Linux.
This use is similar to the electric screwdriver. It takes a mundane task and uses a tool to automate parts of it so that the task is performed more efficiently, saving developers time which they can then use for more creative endeavours.
“It can take over the activities humans are doing today, and humans will just move on to another activity. I think 80% of the American population used to work in agriculture. Less than 2% are now (according to USDA ERS – Ag and Food Sectors and the Economy) — humans have moved on to other activities and along with that our quality of life has improved,” said Agrawal.
Foundation patterns have the potential to change many processes that are now tedious or repetitive for humans. They also provide the ability to create radical and unforeseen solutions to some of the toughest problems we face. Indeed, base models could mean a complete paradigm shift in the way knowledge is created and applied. The key will be to ensure that these models are made available to the general public, with the right safeguards in place.
#future #lies #flexible #reusable #base #models