Share
RoboRXN: Automating chemical synthesis
Author: Teodoro Laino, Distinguished RSM, Manager

For most of us, chemistry may just be a distant memory from childhood that takes us back to school, when we could do experiments with chemical reactions. After all, who didn't love to attend the school's science fair? It was the moment when we could mess up the kitchen by mixing baking soda, vinegar, water and red paint to make a volcano erupt.

Chemistry is everywhere. From vital ingredients in consumer products like aspirin, to raw materials in products like nylon, chemistry plays an essential role in products and technologies that we cannot even imagine living without. However, many of us may not know that it takes, on average, at least 10 years to discover and commercialize a new material, and that the estimated production costs are around US$ 10 million. Take Nylon, for example: research began in 1927 and the product could be used for the first time in a toothbrush only in 1938. Or vitamin B12, whose synthesis required 12 years and the work of a team of more than 100 people , including doctoral and postdoctoral students.

Synthetic chemistry, or the art of making materials, is still an extremely traditional discipline in terms of digitization and incorporation of new technologies. Chemists still use many of the same protocols and little progress has been made to modernize the old trial and error practices to allow for a new era of accelerated discoveries.

To change this scenario, a dynamic group of IBM Research scientists is using modern tools like artificial intelligence (AI), cloud technology and robotics.

IBM scientists change the game

It all started three years ago, when we started to develop machine learning models to predict chemical reactions. After a few months of in-house development, we launched the service for free via the IBM cloud in August 2018, and the response was incredible. We call it RXN for Chemistry .

The magic behind RXN for Chemistry is a neural translation method based on state-of-the-art machine learning which is able to predict the most likely outcome of a chemical reaction using neural machine translation architectures. Similar to a Portuguese to English translation, our method translated the language of chemistry, converting reagents and reactives into products, using the SMILE representation to describe chemical entities.

Using SMILES, this molecule is translated into BrCCOC1OCCCC1

Since launch, we have been refining architecture training and today, after two years, RXN for Chemistry is still the best-performing data-based AI method for predicting future reactions, with more than 90% accuracy. But it is not just us who are saying this; just ask the 15,000 users who, in total, have generated more than 760,000 predictions of chemical reactions based on machine learning in the past two years.

More recently, in 2019, we started to collaborate with a group of specialists in synthetic organic chemistry from the University of Pisa (Italy) to integrate a retro-architecture to the RXN tool. To better understand, think about how a pizza is made. Retrospective architecture tells you what the pizza's ingredients are and provides general instructions for creating it in the correct order. Working with the team in Pisa, we added this capability to RXN for Chemistry in October last year.

The research behind the autonomous laboratory

Going back to the example of how to make a pizza, the general guidelines provided by the retrospective analysis may not always be enough to leave the pizza at the ideal point. There are always some little secret ingredients or technical details that will make the difference between a gourmet pizza and a normal one, like mixing first part of the ingredients to make a special fermentation, then adding the other ingredients in a second step. These are the kind of tips you can get straight from experienced cooks or by reading your favorite cookbooks. A chemist does the same thing to learn certain tips.

And then you may wonder why it is necessary to knead the pizza dough. This is probably the most tedious task, but also the most important to develop the right texture. Still, mixing everything and spinning the dough can be fun once or twice, but doing it 50 or 60 times a day is tiring and time consuming. That time and energy could be better used in another way. The same goes for a chemist who synthesizes molecules.

So, how can we make chemistry fun again? We did this by reinventing the way chemistry is done. All we needed was a combination of AI, cloud technology and chemical automation. This mixture led to the creation of RoboRXN - machine learning algorithms that autonomously design (AI) and execute (automation) the production of molecules in a remotely accessible laboratory (cloud) with the least possible human intervention.

So, remember the secrets to making pizza? The main challenge of chemistry is that many operational details on how to "cook" chemical ingredients are reported in prose or in the form of unstructured data, which makes easy analysis and interpretation difficult. In order to build an AI model with the ability to learn the correct steps of chemical procedures, we first had to face the following challenge: to design an algorithm that specifically extracts synthesis information for organic chemistry and converts it into a structured and suitable format for automation.

Regarding the complete approach to the RXN structure, we opted for a purely data-driven scheme. This means that once the machine learning algorithm gets enough examples, it can find out for itself which words to pay attention to in order to extract the correct production steps. To provide the training data for the machine learning model, we set up an annotation structure that allowed us to generate examples of phrases related to the synthesis procedures and the corresponding operations. The main advantage of this data-based approach is that it is data-only. To improve it, you simply need more examples.

Unlike other approaches, our deep learning model converts experimental procedures as a whole into a structured format that is easy to automate, rather than examining the text for relevant information. In addition, it is not based on the identification of individual entities in sentences, nor does it require the specification of which words or groups of words correspond to the synthesis actions, which makes the model more flexible and reliable.

Our pioneering work is currently presented in the scientific journal Nature Communications.

RoboRXN learns

Building a robust set of data for chemical procedures has allowed us to build the heart of RoboRXN technology: an AI model that, being trained on a large number of chemical recipes, learns the specifics of chemicals to be able to recommend the correct sequence of operations to "cook" a specific molecule.

Going back to the pizza analogy: imagine an AI model that can not only retrieve your favorite recipes as much as you order, but can also automatically query your knowledge base to provide an ideal list of instructions for making that gourmet pizza that is sure to impress your guests from dinner.

From an IT perspective, this is similar to having an artificial intelligence architecture that writes programs to make molecules (or cook food). Our goal in building RoboRXN was to use this AI model to eliminate the tedious human task of programming commercial automation hardware. And to make the RoboRXN system even more convenient and easy to use, we have implemented the entire suite of services on the IBM Cloud to make it accessible wherever there is an Internet connection.

Revolutionizing industrial chemistry

The result is a reliable and autonomous infrastructure that integrates technologies such as cloud, artificial intelligence and automation to help chemists not only predict chemical reactions, but also perform the production of a molecule or substance from anywhere in the world, which is particularly critical as we continue to work from home.

What are the implications of this? Imagine if an automated system like RoboRXN could help chemists to halve the time to discover a new treatment for COVID-19 or any other virus.

Or what would happen if RoboRXN could help accelerate the development of a fertilizer without having to consume 1% to 2% of the world's annual energy supply for its production?

The possibilities are endless when it comes to humans + machines.

http://youtu.be/ewE1wh7sTUE

quick access

en_USEN