DeepMind says its AlphaFold machine-learning software can now rapidly predict the structure of proteins with high accuracy, and could one day help us develop drugs faster.
In its announcement on Monday, heralded as a scientific breakthrough by some, the Google stablemate claims to have solved a 50-year problem in biology: building a computer system capable of accurately and quickly modelling the structure of a protein from its strings of amino acids.
For the uninitiated, proteins are molecules essential for life as we know it: they transport matter within cells and bodies, they perform chemical reactions as enzymes, they protect you as antibodies, and so on. How they each function is basically defined by their shape. However, proteins are complex 3D arrangements nanometres in size, and so it is tricky, though not always impossible, to figure out their structure through observations.
An alternative approach is to use software to link the amino acids detected within a protein – each protein is formed from chains of these acids – to the protein’s intricate structure. Computer programs like AlphaGo estimate the structure of a protein from its amino acids. Knowing the shape of a protein’s components helps us understand how they work, which helps us do things like develop drugs, and create new or copycat proteins.
The Critical Assessment of Protein Structure Prediction competition, known as CASP for short, was set up in 1994 to compare code for predicting protein folding, and is held every two years. This time around, DeepMind’s AlphaFold reached the highest score ever of 87 GDT, just below 90 GDT, a score considered to be as good as results obtained from physical observations of proteins. Crucially, the AI-based software is able to accurately predict the structure of individual proteins in a matter of minutes and hours, rather than weeks, months, or years as needed by experiments. That means drug development work can be vastly sped up.
“We have been stuck on this one problem – how do proteins fold up – for nearly 50 years,” said John Moult, co-founder and chair of CASP, and a professor at the University of Maryland in the US. “To see DeepMind produce a solution for this, having worked personally on this problem for so long and after so many stops and starts, wondering if we’d ever get there, is a very special moment.”
GDT stands for Global Distance Test (GDT) and is ranked on a scale of 0 to 100. AlphaFold’s score means it is able to predict the structure of a protein to about 87 per cent accuracy: the guesstimated positions of the amino acids may be off by distance of 1.6 Angstroms or 0.16nm – about the width of an atom.
NHS England offers £15m to AI firms for software that helps with stroke victims’ treatment as COVID-19 stretches service
“We have to get much more accurate in order for it to be useful for biologists,” DeepMind’s CEO Demis Hassibis told El Reg at the time. The latest 2020 entry has a different architecture, one based on an “attention-based neural network system.”
“A folded protein can be thought of as a ‘spatial graph’, where residues are the nodes and edges connect the residues in close proximity,” the AlphaFold team explained this week. “This graph is important for understanding the physical interactions within proteins, as well as their evolutionary history.”
AlphaFold was trained on approximately 170,000 protein structures, and learned the spatial representation of all the constituent amino acids in each one in order to predict structures presented in the CASP competition. “It uses approximately 128 TPUv3 cores (roughly equivalent to ~100-200 GPUs) run over a few weeks, which is a relatively modest amount of compute in the context of most large state-of-the-art models used in machine learning today,” the team said.
“Being able to investigate the shape of proteins quickly and accurately has the potential to revolutionise life sciences,” said Andriy Kryshtafovych, who helped organize the CASP competition and a project scientist at the University of California, Davis.
“Now that the problem has been largely solved for single proteins, the way is open for development of new methods for determining the shape of protein complexes – collections of proteins that work together to form much of the machinery of life, and for other applications.”
Although the AI performs almost as well as scientists, it won’t replace lab experiments altogether. Predicting the structure of individual proteins doesn’t describe how they might interact with one another or with other molecules like DNA or RNA.
“AlphaFold is one of our most significant advances to date but, as with all scientific research, there are still many questions to answer,” the DeepMind team concluded. “Not every structure we predict will be perfect…In collaboration with others, there’s also much to learn about how best to use these scientific discoveries in the development of new medicines, ways to manage the environment, and more.”
The London-based AI lab is expected to reveal more details in an upcoming peer-reviewed paper. A spokesperson was not available for further comment. ®