Home > News > It

DeepMind's AlphaFold 3 Model Set to Cause Another Earthquake in the Biological World

Shi Chao Sat, May 11 2024 06:58 AM EST

Recently, AI news has once again flooded my feed. The company that brought us AlphaGo, which made headlines by defeating Go champion Ke Jie, Google's DeepMind, has unveiled their latest generation model, AlphaFold 3, in Nature.

AlphaFold, a name that sounds like a model of a foldable phone, is their new AI specialized in predicting protein structures.

It can predict almost all molecular structures within living organisms.

This means that biomedical research has now gained a true "God's-eye view," where the mechanisms of any biological molecule will be unveiled from a black box and transformed into a perspective mode.

Many media outlets and netizens have started cheering. In the 21st century, it seems that the era of biology is truly upon us. Sa6f05e6c-8003-4335-841b-a679db790311.png To understand how impressive the newly released AlphaFold 3 is, we first need to know the impact DeepMind and its AlphaFold have had on the molecular biology community.

We've all learned during our nine years of compulsory education that the most abundant substance in living organisms is proteins. To unravel the underlying principles of biological molecules, we must know the specific structure of each protein.

Before AlphaFold, there were mainly two methods for predicting protein structures. One involved shining X-rays on protein crystals, essentially taking pictures first and then analyzing them to understand the protein's structure. The other method used nuclear magnetic resonance (NMR) spectroscopy to capture the overall shape and then infer its structure.

These traditional methods were not only slow and limited in scope but also costly. Each X-ray image could cost tens of thousands of dollars, equivalent to buying a small SUV.

This is why research in protein studies has been expensive and required a wealth of experience. Only experienced experts, protein wizards, could more quickly guess the accurate shape of a protein and reduce the need for extensive imaging. cd5b7d92-753a-4b6b-b87a-3f49ff702f18.png So people wonder, can AI solve this kind of work that requires experience and summarization?

DeepMind stepped in to tackle this. To overcome the issues with traditional methods, the first generation of AlphaFold took a different approach:

No more trial and error!

Since proteins are made up of amino acids, the original AlphaFold used a method that involved gathering known protein structures from various public sources. It then compiled the distances between each pair of amino acids, linking angles, and put it all together to create a map. AI digested this data using neural networks and made its own predictions. S45ce51db-4f5a-406a-92b2-64c71dbf79d0.png When the first generation of AlphaFold was released in 2018, it amazed the scientific community, outperforming many seasoned experts to win the 13th Critical Assessment of Structure Prediction (CASP) competition.

AI truly is remarkable.

However, the initial AlphaFold had a limitation – it relied more on local data features for training and struggled to capture relationships between distant elements.

It's like a writer who can only write short stories but struggles with crafting a novel.

The challenge lies in the fact that many protein molecules exhibit long-range dependencies, which posed difficulties for the first version of AlphaFold.

Fortunately, in 2020, AlphaFold 2.0 was introduced, incorporating the Transformer model that gained popularity on ChatGPT. Sd21bd230-a84d-49a2-94de-f91888fa9834.png The attention mechanism of the Transformer model has perfectly addressed the long-range amino acid problem. How significant is the progress?

In the 2018 protein structure prediction competition, version 1.0 scored less than 60 in accuracy. However, in the 2020 competition, version 2.0 achieved an astonishing 92.4. It can now cover up to 98% of known human proteins, and most importantly, it is completely open-source. S70789c2d-30b6-4d14-8098-68794e30d77a.png It can be said that version 2.0 has essentially solved the problem of predicting single-chain proteins.

By 2021, the release of AlphaFold-Multimer based on the 2.0 revision also supports multi-chains, achieving a breakthrough in accuracy, with prediction accuracy of protein interactions exceeding 70%.

As a result, many companies are now using them, even assisting in the development of some foreign COVID-19 vaccines. S786e42d1-cb88-4a8c-9660-b5e248aee730.png But in the eyes of DeepMind, the victory in protein structure prediction is just the beginning of unleashing AI's potential. This is because the complex molecular structures inside organisms are not limited to just proteins; there are also nucleic acids, small molecule ligands, and more.

It's like spending ten years mastering key-cutting techniques only to find out that everyone is using fingerprint or password locks, and very few are still using traditional keys!

Therefore, with AlphaFold 3, they have introduced a more powerful all-encompassing model that can predict not only proteins but also various small molecules like DNA, RNA, and reveal their interactions with each other. S08bb210e-e633-4876-b339-6829123b8d4f.png That's how it's done? The answer is, they used Diffusion.

Yes, the famous diffusion model, which you must have heard of during the AI art craze.

Its principle is to continuously pixelate the original image, then let AI learn to predict the generation process of these mosaics, and then reverse the process to generate images from mosaics.

However, just like AI struggles with drawing fingers and Sora chair videos can glitch, AlphaFold 3 with the support of Diffusion can also make prediction errors, especially in structures that are similar and hard to distinguish, like chiral molecules you might have learned about in high school organic chemistry. S58834409-68c7-45f4-b4d1-dd0ddff93636.png So in these error-prone areas, DeepMind used an operation called cross-distillation, which essentially involves having the 2nd generation with Transform models make predictions first, then adding the predicted data to the training of AlphaFold 3. In other words, it's like having the 2nd generation act as a teacher, guiding the 3rd generation, reducing prediction errors.

How good are the results? Just take a look at the official image.

AlphaFold 3's prediction for 7BBV - an enzyme (found in a soil fungus) shows the enzyme protein (blue), ions (yellow spheres), and monosaccharides (yellow) almost perfectly overlapping with the actual structure (grey). S167d0e4e-0431-4147-a4a2-b005a0d3b0cf.jpg AlphaFold 3 accurately predicts the structure of the interaction between the spike protein of the common cold virus (in blue), antibodies (in aquamarine), and monosaccharides (in yellow), matching the real structure closely (in gray). S4ffbce25-a43e-44b5-b426-2cf8a9dc9486.jpg AlphaFold 3 predicts protein complexes, where proteins (blue) bind to DNA (pink), with the predicted model closely matching the experimentally determined real molecular structure (grey). S362810d8-1f7a-415c-91d0-074887a7592e.jpg In addition to delivering top-notch quality, AlphaFold3 excels in accuracy at the atomic level. It outperforms other products comprehensively in simulating protein and nucleic acid ligands, and its simulation of antigen-antibody interactions is equally outstanding.

Operating AlphaFold3 is now even easier.

With ChatGPT, you have to come up with good questions and prompts, but with AlphaFold3, you just need to input a list of molecules, and it can predict how they will combine. Sd6354bcc-22e8-44f4-acfc-0c5fe7f96a9b.png Imagine a phenomenon that used to require a significant amount of time, effort, and resources to observe can now be achieved by simply inputting parameters on a website and clicking a button. Within minutes, highly detailed and accurate models of large biomolecules can be generated.

Even the biochemical processes inside cellular systems, the functioning of DNA, reactions of drugs and hormones, all can be understood in a very short amount of time.

These cutting-edge data and the enthusiasm of many seem to suggest that this release is not just a significant advancement, but a revolutionary breakthrough that might change the entire traditional research methods in biomedicine.

However, I believe that while optimism is good, in science, besides optimism, what is needed is also objectivity and rigor.

Amidst the media frenzy and online buzz about "explosive," "revolutionary," and "world-changing" advancements, many experts in the field have also shared their evaluations of AlphaFold 3.

For instance, Professor Yanning's team discovered that version 3.0 failed in predicting a glycoprotein, performing even worse than its predecessor.

Many scientists have also criticized that compared to version 2.0, version 3.0 is not open-source and has limitations on usage. S6f0391a5-b80c-4d4d-a87b-9430d696c175.png Even some question DeepMind's CEO Hassabis, who himself founded a "AI-focused drug company" claiming to "redefine drug discovery using AI," but as of 2021 to today, they have not released any drugs.

Of course, this is a bit of a dig, as protein structure is just a small part of the drug development process and does not have a decisive impact on the progress of drug development. Sa65bac2a-c54f-4d6d-be42-12fabc5a9268.png In conclusion, I find AlphaFold's third-generation product quite impressive. However, in the long journey of life sciences, there are still many challenges it needs to overcome.

Nevertheless, progress is always a good thing. Hopefully, DeepMind can do more and do it faster.