AI in Medicine: Life Sciences and Drug Discovery

Artificial intelligence offers the promise of better health, faster drug discovery and testing, to create improved medical outcomes for patients. We talk with a world expert on using AI in life sciences to discover and develop drugs faster and less expensively.


Feb 08, 2019

Artificial intelligence offers the promise of better health, faster drug discovery and testing, to create improved medical outcomes for patients. We talk with a world expert on using AI in life sciences to discover and develop drugs faster and less expensively.

Dr. Alex Zhavoronkov is the founder and CEO of Insilico Medicine, a leader in the next-generation artificial intelligence for drug discovery, biomarker development, and aging research. Prior to Insilico, he worked in senior roles at ATI Technologies, NeuroG Neuroinformatics, the Biogerontology Research Foundation and YLabs.AI. Since 2012 he published over 130 peer-reviewed research papers and 2 books. For six years in a row, he has organized the annual Aging Research for Drug Discovery and Artificial Intelligence for Healthcare forums at Basel Life/EMBO in Basel. Alex is an adjunct professor at the Buck Institute for Research on Aging.


Michael Krigsman: Artificial intelligence in drug discovery is a relatively new field. It's a very important field. Today, we're speaking with one of the most prominent voices in AI and drug discovery.

I'm Michael Krigsman. I'm an industry analyst. Thank you so much for watching. Before we begin, please subscribe on YouTube and subscribe to our newsletter. You can do that right now.

Alex Zhavoronkov, he is the CEO of Insilico. Tell us briefly about Insilico Medicine and tell us the things that you're working on.

Alex Zhavoronkov: We are focused primarily on applying next-gen AI techniques to drug discovery, biomarker development, and also aging research. We focus specifically on two machine learning techniques. It's generative adversarial networks and reinforcement learning. Those are the techniques we are most expert in our field.

We use those techniques for two purposes. One is identifying biological targets and constructing biomarkers from multiple data types and also generating new molecules, new molecular structures with a specific set of properties. We were one of the first companies, possibly the first one, to generate new molecules using this new technique called generative adversarial networks--it's kind of AI imagination--and validate those molecules experimentally.

What is the Drug Development Pipeline?

Michael Krigsman: Give us some context. What is the drug development pipeline? Why is it so hard? Let's talk about that. Then we can shift to how AI makes that better, makes it easier.

Alex Zhavoronkov: Drug discovery and drug development is a very lengthy process. It's also one of those processes where you've got more failures than successes. Actually, much more failures than successes.

It takes more than $2.6 billion to develop a drug and bring it to the market to address a specific disease. That's after the molecule has been tested in animals. Also, there is a 92% failure rate after the molecule has been tested in animals. When it goes into humans, it fails 92% of the time. So, the process is not only lengthy, but also risky.

Usually, the time it takes to discover and develop a molecule is around a decade. People who initiate the process are not always there when the molecule launches. The process is comprised of several steps.

The first one is hypothesis generation. You come up with a hypothesis, a theory of a certain disease and identify relevant targets. You theorize about what kind of proteins are implicated in a disease condition and what proteins are causal.

Afterward, you go and develop either an antibody or a small molecule for this protein target. If you are developing a small molecule, you usually start with screening large libraries of compounds that might hit this particular target and do all kinds of experiments to see how well those small molecules bind to this target.

Afterward, you select several hits. You identify what kind of molecules fit best for this protein target and start doing all kinds of experiments on those molecules to see if they work very well in the biological system, in the disease-relevant assay, in a mouse, in a dog, or other animals, and then you file for IND with the FDA to get the molecule into clinical trials.

After that process is complete, we are getting into drug development and starting clinical trials. It starts with phase I, which is safety; phrase II, you test for efficacy; and phase III, you test for both in a larger clinical setting, in a larger population. Then you might want to go for a phase IV or start launching the product.

Michael Krigsman: Mm-hmm.

Drug Discover and Post-Marketing Research

Alex Zhavoronkov: And then, post-marketing research. That process takes more than ten years, usually, and fails 92% of the time.

With AI, you can really play in pretty much every segment from early-stage drug discovery where AI can assist you with a hypothesis model and, essentially, pulling out the needles from the haystack with a target ID, with small molecule identification, with virtual screening, with generation of novel molecules with specific properties, with planning your clinical trial design with enrolment of the clinical trial. And then, also, for predicting the outcomes of clinical trials.

Michael Krigsman: Where does AI begin to shorten that process, make that process better?

Alex Zhavoronkov: If you got the very early steps of the pipeline and start working on the hypothesis generation and target identification, usually you have multiple kinds of paths to pursue. One path is to look at the literature and identify promising areas that had been uncovered by scientists in the past and were published in peer review literature. Ideally, these targets, those hypotheses were not implicated in the disease that you are looking at by somebody else.

AI can help you mine massive amounts of literature and also other associated beta types to identify signals that a certain target might be implicated in the disease. We, at Insilico, usually start with grants data. We look at biomedical grants that monitor about $1.7 trillion worth of grant money over the past 25 years. Then we look at how those grants progress into publications, into patents of the clinical trials, and then into products on the market.

We follow this idea from idea and money to money, so from money on the market. We also look at how money becomes data. So, usually, when the government is supporting a certain study, the data needs to be deposited in a public repository for other people to replicate it and also for the common good.

We try to follow the money into data. If the data is not there, we try to contact the scientist and get the data from the scientist and/or to encourage the scientist to put the data into the public repository.

We start with text databases, but also link this data to omics data. It's basically everything that ends with "omics" is called omics data, so transcriptomics, genomics, metabolomics, you name it, so metagenomics.

We work primarily with gene expression data, so we look at how the level of expression of certain genes or entire networks change from, let's say, a health state to disease. We deconvolute those changes, those signatures of disease into individual targets, especially causality models, and identify what kind of proteins could be targeted with a small molecule.

Then we go back into the prior art in the text and see if anybody has published anything that strengthens our hypothesis. It doesn't necessarily mean that our hypothesis is wrong if the signal is not there in text because sometimes the humans just couldn't really associate a certain target with a disease using older methods, but it gives us a little bit more confidence to see that somebody already touched on this challenge and on this target before.

Michael Krigsman: Alex, is the key then at this point that the various AI techniques that you're using enable you to discern patterns in the data that those signals, as you said, that otherwise you could not pick out? Is that the key issue here?

Alex Zhavoronkov: Yes, but, really, we are aggregating enormous amounts of data that is just not possible to process using human intelligence. We are also aggregating and grooming those data types together. Sometimes, those data types are completely incompatible and it's impossible to just suture them together using standard tools. You really need to train deep neural networks on several data packs at the same time in order for them to generalize and in order for us to be able to extract relevant features that are present in several data types at the same time.

Some of the data types that we work with are completely incomprehensible to the human mind, to human intelligence. Like, for example, gene expression or movement or cardiovascular activity scanning or ultrasound, for example. We manage to bring those data types together using AI and then identify relevant targets that basically trigger a certain condition.

Core Competency: Biology vs. AI 

Michael Krigsman: At Insilico, is your core competence in biology and medicine or in developing the AI techniques? Is it possible to even split those two?

Alex Zhavoronkov: In our case, we are good at both and we hire competitively, internationally. We actually hire through competitions where we put very challenging tests out in order for people to try and solve them very, very quickly. Those challenges are usually in combination of developing an AI method plus solving a complex biological or chemical problem.

However, when you're looking at really great AI scientists, they are usually not great in biology or great in chemistry. They are good at math. That is why some percentage of our company are just great mathematicians who are developing novel methods for bridging chemistry and biology using deep learning, for example.

Part of the company is specifically focused on applications of already existing techniques like GANs and reinforcement learning to existing problems in chemistry and biology. Those people are usually on the applied side and they know both chemistry and biology. They can talk to the mathematicians and they can do some basic research in AI as well.

Of course, we just have pure play biologists and chemists who are also necessary in order to validate some of the results of our AI. That's why we have such a large, diverse, and international team because you really need to have those three areas covered: the methods, the applications, and the validation.

Michael Krigsman: We have an interesting question from Chris Peterson on Twitter who says this; he says, "Grid-based parallel Fortran programs are still being used for some pharmacokinetic and pharmacodynamic studies. Do you see AI replacing the old school code, enhancing it, or advancing in parallel?"

Alex Zhavoronkov: I think, currently, we need to advance in parallel. Of course, some of the old techniques and some of the very primitive mol dynamics are still being used by really top experts in drug discovery today. But most of those methods are being significantly accelerated by high-performance computing and AI, so typical software that's been around for a very long time, like Schrodinger, for example. The company has been around since '92.

This guy has made major breakthroughs in multiple areas and kind of managed to advance older algorithms to solve very complex problems. I think that at Insilico, we try to reinvent everything from scratch and we write our own software. But, of course, we know many of our collaborators who would just like to take small pieces of our big salami that we're developing and play around with it today. They might be using some more classical tools that we cannot get around today.

Ideally, you need to have a seamless pipeline, which identifies the targets, generates the molecules, and runs those molecules through a large number of simulations in one seamless pipeline. That's what we are building and that's our holy grail. But, of course, many companies, many groups are trying to do the Lego game and try to use multiple tools with varying outputs to solve the same problem.

Developing AI Tools In-House

Michael Krigsman: Why do you develop your own tools?

Alex Zhavoronkov: Yes, just because many of the methods that we are using are so new that they are incompatible with the older tools. There are many groups that claim to do AI but, essentially, what they are doing is they are mechanic jobs taking off-the-shelf software and trying to bridge some gaps in pharma R&D using those tools. We don't do that. We develop everything from scratch, so from target ID to small molecule generation.

Michael Krigsman: Now, we have spoken about using your techniques to uncover potential candidates. The next step is evaluating. First, we have to uncover possibilities, and you do that by aggregating all of this data and then mining that data using the various techniques. Now you've done that. How do you evaluate the candidates that you've uncovered initially?

Alex Zhavoronkov: Usually, when you are left with a list of protein targets for a specific disease and you are trying to prioritize, you try to annotate those proteins with as many scores as possible. You are looking at whether this protein target has ever been implicated in toxicity. How is it connected with everything else? Which tissue does it play in more? How does it interact with other proteins? Is it druggable? Is it druggable with a small molecule or with an antibody? Did anybody else touch it? What is the patent space around the molecule? Has anybody tried taking it into the clinic with a small molecule or an antibody for a specific disease?

There are many, many, many, many scoring functions that you need to consider. At the end, when you basically are left with a very small set of targets, then you also test them in a variety of biological systems to see which one is more relevant for your disease of interest.

I'll give you an example case study. For example, we are very interested in fibrosis. Fibrosis is not a very simple process to describe and there are multiple types of fibrosis. There is IPF, so pulmonary fibrosis. There is smoking-induced fibrosis in the lung. There is aging-induced fibrosis in the lung. We've identified more than 120 types of fibrosis by comparing normal tissue to tissue inflicted by a certain condition that is associated with fibrosis.

We just recently did a case study where we looked at the IPF, so pulmonary fibrosis, identified the list of targets for this condition, and our list was 50 targets. We looked at when those targets are more active and more disease-relevant at what stage of the disease because I think, if you kind of catch it later or address it later when there is just so many symptoms, you are going to be treating the symptoms, not the cause.

In our case, we've identified a large list of targets that are likely to be very relevant early in the disease progression. Then we looked at what targets are novel, so we looked for novelty, so what targets people did not focus on as much. We don't want to focus on old targets. Then we looked at what targets are druggable, so where we could actually come up with a small molecule from within the library or we can generate a molecule from scratch. Then we looked at what targets could be validated in a specific set of assays for fibrosis.

Michael Krigsman: Where is the impact of the AI techniques that you're using in this?

Alex Zhavoronkov: Usually, it's for scoring. You identify multiple scores for those targets. In our case, the target is annotated with more than 50 scores. Whether it has been implicated in a certain condition before, whether it interacts with other proteins in a specific way, whether it is likely to lead to toxicity. Those predictors that basically give you this kind of score and probability that this target is the most relevant one, these scores are deep learning. We developed them using machine learning.

Academia vs. Industry

Michael Krigsman: We have another interesting question from Twitter. This is from Shreya Amin. She says, "How does this type of research that you've been describing using AI and the process compare between academia and industry?"

Alex Zhavoronkov: Sure. It's a very, very good question. In the industry, in big pharma, people are a little bit less adventurous. They are trying to develop the various techniques to really solve a problem and make incremental changes. It's not for publication purposes.

In academia, people are much more innovative and adventurous. Of course, they try to publish. That's where the innovation comes from primarily.

We, at Insilico, we sit in between academia and industry, so we publish at the rate of about two research papers a month. That is a lot for even some of the academic groups just to also prove the concept and explain where we're going.

Academics, I think, are much more productive nowadays, whether it comes to developing new methods and showing new directions. However, the disconnect between really good computer scientists that are developing novel techniques that might be relevant for drug discovery, they very often are so far away from biology and chemistry that they put the papers out and the paper is really from the machine learning perspective, but it's really, really poor from real-world applications. Very often, they don't really understand that they overfitted somewhere or if it's a completely irrelevant output that they are getting, or input, only after somebody tries it in biology and chemistry.

Very often, and nowadays it's actually more prevalent, a lot of people put papers on Archive, so in a repository, with a catchy title so it goes viral and gets picked up by the browsers, by Google, or by some news outlets. They get recognition and PR for this work, but then you try to replicate what they did or even just read the paper carefully, and you realize that it's not going to work in the real world. I think those kinds of papers and those kinds of efforts, early efforts, by academic groups specifically, without going through a peer review, also put a lot of skepticism in big pharma. People just don't think that many techniques are relevant, applicable, or transformative for their business.

Building a Team for AI and Biotech

Michael Krigsman: Let's talk about the team construction aspect because one of the things that you've mentioned a couple of times is the importance of both the machine learning capabilities as well as the biology capabilities. These are very specialized skills, and so how do you construct teams that enable both sides to work together and create something that one or the other could not do alone?

Alex Zhavoronkov: That's another very good question. In our case, that's one of the reasons why we are growing so slowly. We've been in business for 5 years now, but we are still 66 people. One of the reasons for this slow, organic growth is because it takes time to really integrate the AI scientists with biologists and chemists. It's very difficult to find people who are good at both at the same time. Usually, you are good at math or you are good in chemistry or you really need to have some good programming skills to be able to do an API and properly combine your technology with somebody else's.

We try to work in teams of three or four on specific therapeutic projects where one person is very good in chemistry or biology, one person is good in AI, and another person is good in just basic IT. It's basically teams of three or four people. On top of them, there is an infrastructure, an organizational infrastructure that helps manage those teams. We also separated the pure play AI team from everybody else, so they could work on the methods without being brought into the applied domain.

Getting this kind of talent who are willing to really contribute to methods development and develop novel algorithms, that is very, very difficult. Getting people who are good in application of already developed methods, that is rather easy. Getting the two to work together, that is very hard. To do this we, again, try to pursue organic growth and work on projects in small teams.

Insilico Business Model

Michael Krigsman: In fact, we have a question from Twitter on this subject of your business model. Chris Peterson is asking great questions. Thanks so much, Chris. He's asking, "Are you contracted to look for specific therapies or are you developing molecules from scratch and hoping to license them for clinical trials through distribution?"

Alex Zhavoronkov: We've been in business for five years and we have explored multiple business models. As an AI company, you have to explore because otherwise it's very, very difficult to scale on one business model and it's also quite risky.

We started as a service company, and we started partnering with pharmaceutical companies, with biotechnology companies and, also, venture funds where we provided a service or provided a system to them. We learned the applications that people are looking for and started developing our own small molecules, discovering our own small molecules and then licensing them.

Our current business model is actually very simple and actually allows us to scale. We work with venture capital firms that really know the business of biotechnology and are pursuing drug development and drug discovery. They guide us into where we need to identify targets and generate small molecules. Then they form teams around those small molecules and targets from them and let them do a little bit more validation and development of those target molecule associations.

What we get, we get a small upfront payment initially and then we get milestone payments as the molecules progress through the various steps of validation. Then we get some royalties. Usually, if you consider the BioBox or the future revenues that might come from the molecule, those deals are very, very substantial, but initial payment is rather small.

That is why we have another business that is a software licensing business where we license some of our software tools to others to generate some revenue and ensure that we are sustainable, consistent, and also get some feedback on how well the software works; if we need to add more features.

Michael Krigsman: Okay.

Alex Zhavoronkov: Another business model is that we do have some joint venues. For example, a joint venture with a company called Juvenessence. They are developing the molecules that we provide to them.

Michael Krigsman: Okay, so you have a diverse range of things that you're working on and trying that support your business model efforts, essentially.

Alex Zhavoronkov: Correct. But what we are mostly interested in is not the immediate revenue. In most of those licensing arrangements and engagements, we get some data back. We pretty much became one of the largest data factories in the world, getting data back from preclinical experiments.

Michael Krigsman: That's interesting. We have another question from Twitter. This is from @TrovatoChristian. He is a biomedical engineer and he is a Ph.D. student in computational biology in the Department of Computer Science at Oxford. By the way, I find it very interesting that computational biology falls under the Department of Computer science rather than the Department of Biology. His question is, "Are there any examples of drugs developed by AI only?"

Alex Zhavoronkov: At this point of time, there is no such example. You always have a human in between. I hope that in the very near future, we'll be able to show that the pipeline where no human was involved from target identification to small molecule generation might be able to churn some of those molecules, some of those promising molecules. But at this point in time, the experiment is king. So, unless you can validate your techniques experimentally, it won't really go forward. I have never seen an example of a molecule, even in mice at this point in time, that is completely generated using AI.

Michael Krigsman: What's the obstacle preventing using AI to go from beginning to end?

Alex Zhavoronkov: Well, because of the failure rates in pharma, in general. There are very, very few success stories to train on. Those success stories are very, very diverse. In some areas, it's easy to validate whether your algorithm is producing some meaningful output. But, in many cases, you really need to go and validate at every step of the way. That is why, when you are building this salami that is allowing you to go end-to-end, you need to ensure that you validate every slice of the salami and validate it internally, but also validate it with external partners. That's what we are trying to do as well.

Michael Krigsman: Eventually, that data may be there, but it sounds like it's just far too early at this stage.

Alex Zhavoronkov: At this stage, nobody tried to virtualize drug discovery completely using AI and do it seamlessly without human intervention. In many areas, it's actually not possible just because biology is so diverse and medicine is so diverse that it's very, very difficult to have a solution that would fit all. That's why people are going primarily after cancer just because it's a little bit easier to validate and the specific types of cancer, like for example solid tumors where you can do a xenograft and see if the tumor shrinks in a mouse if you feed it, if you give it a specific molecule. There needs to be validation at every step of the way and, at this point in time, those end-to-end pipelines will work only in certain therapeutic modalities.

Michael Krigsman: Let me ask you another question from Twitter. This is from Shreya Amin again, a great question, an interesting one. She says this; she says, "Using existing AI techniques, which areas from the perspective of types of drugs, diseases, conditions, and so forth are closest to breakthroughs or have made the most progress and what's most difficult?"

Alex Zhavoronkov: I'll give you an example that I am very, very familiar with. We've got some JAK inhibitors, so Janus kinase inhibitors that are developed completely using generative adversarial networks and reinforcement learning. I think those are kind of the most promising techniques for de novo molecular design - period.

We're currently in mice with those, so went all the way from enzymatic assays to mice, and showed that we can now achieve selectivity, specificity with those molecules, and those molecules have many other properties. Those are pretty common techniques nowadays, both the GAN that we used and the reinforcement learning technique that we used. It's not something super new, so we actually switched our R&D in a slightly different direction.

Michael Krigsman: Where is all of this going over the next--I don't know--three, four years, two to four years? Let's not go out ten years. Over the next few years, where is this going to be?

Alex Zhavoronkov: I think that companies like ours are going to put much more emphasis on their internal R&D instead of collaborating with big pharma because collaborating with big pharma is usually a path to nowhere because it's either death by pilot or they just ingest this expertise internally and catch up. But, at the same time, they are so bureaucratic that it's very difficult to change and, at the same time, at the CEO level, big pharma companies are more focused on increasing sales or buying other companies to increase sales or to get late-stage clinical assets, so phase two, phrase three assets. The internal R&D is actually not being viewed as a huge priority and, regardless of what they think, it's fact. Usually, it's kind of the 15% to 20% on the income statement that needs to be there because, otherwise, the investors are not going to invest in the company. But the productivity of this internal R&D is usually very low.

I think that smaller biotechnology companies that embrace AI and embrace virtualization of drug discovery, they are going to be very successful. There are several cases that I admire in the industry, like for example Nimbus Therapeutics. This guy has managed to virtualize the entire drug discovery and development process and get some phase two assets to market and license them.

As the SAI improves and starts solving more problems in the pharmaceutical R&D pipeline, so from hypothesis generation, target ID, small molecule generation, prediction of the various properties of the molecule in clinical trials, and better stratification techniques. I think that people who really understand the process and can virtualize it will be the winners. So far, I know several companies that are doing this, so some companies are working with us. Some are in the stealth mode. I think they are going to be winners going forward.

When you talk about drug discovery in two to three years, it's actually a very, very short time. In many other areas of human development, if you ask me to plan five years ahead, I won't be able to because things are changing very quickly. In pharma, that's not the case. We really need to do the experiments and get things right.

Research on Longevity and Smoking

Michael Krigsman: Do you want to just very briefly tell us about the last research you did on either longevity or smoking? I know we're out of time, but just very briefly.

Alex Zhavoronkov: [Laughter] Sure. We just published a very fun paper showing that smoking accelerates aging. One of the areas that we are focusing on is age prediction using multiple data types, so from pictures, blood tests, transcriptomic data, proteomic data, microbiomic data. We use this data to predict the person's age reasonably accurately and we then look at what kind of interventions or behavioral modifications, what kind of lifestyles contribute to that person looking younger or older.

We did this exercise in Canada. We worked with the University of Lethbridge and the government of Alberta to process a large data set of smokers and nonsmokers of varying ages looking only at anonymized blood tests, just very, very few parameters from a recent blood test. First of all, we built a predictor of the smoking status, so now I can, with reasonable confidence, say whether you're smoking or not by looking at a blood test but, also, we showed that people who smoke, they look older to the deep neural net trained on their blood tests than nonsmokers.

Once we published, it actually went rather viral and we got very positive feedback. For example, my daughter is considering quitting smoking just because she doesn't want to look old. People don't really care about their health, but they really care about how they look. If you don't want to look old, just quit smoking.

Michael Krigsman: [Laughter] Okay. Great advice. Alex, thank you so much for taking time. Everybody, please subscribe on YouTube. Check out for lots of videos and subscribe to our newsletter. Have a great day, everybody. Take care. Bye-bye.

Published Date: Feb 08, 2019

Author: Michael Krigsman

Episode ID: 580