Two billion-atom simulations, new insights into the SARS-CoV-2 virus, and a new AI model to speed up drug discovery
Those are the outcomes of Gordon Bell award finalists, regarded as a Nobel prize in high-performance computing. They employed NVIDIA’s technologies to enhance science by using AI, faster computing, or both.
A finalist for the special prize for COVID-19 research employed artificial intelligence to link different simulations, revealing how the virus multiplies inside a host at a new level of clarity.
The study, led by Arvind Ramanathan, a computational biologist at Argonne National Laboratory, shows how to improve the resolution of typical protein structure exploration techniques. This could lead to new ideas about how to stop a virus from spreading.
The team, which included representatives from a dozen companies in the United States and the United Kingdom, devised a workflow that Perlmutter could use, an NVIDIA A100-powered system, and NVIDIA A100 computers.
“The ability to undertake multisite data analysis and simulations for integrative biology will be important for making use of huge, difficult-to-transfer experimental data,” according to the paper.
The team used the popular NAMD application on GPUs to build a way for speeding up molecular dynamics research. According to the researchers, they also used NVIDIA NVLink to accelerate data “far beyond what is currently conceivable with a standard HPC network connection, or… PCIe transfers,” according to the researchers.
In High Fidelity, a Billion Atoms Ivan Oleynik, a physics professor at the University of South Florida, was part of a team that was declared a finalist for the Gordon Bell award for creating the first very accurate simulation of a billion atoms. It shattered a record set by a Gordon Bell winner last year by 23 times.
“It’s a thrill to discover occurrences that have never been observed before; it’s a huge accomplishment that we’re proud of,” Oleynik added.
Simulations of carbon atoms at high temperatures and pressures could lead to new energy sources and help scientists better understand the makeup of distant planets. It’s even more impressive because the simulation is accurate to the quantum level, accurately portraying the forces between the atoms.
“We could only attain this level of accuracy by using machine learning techniques on a powerful GPU supercomputer – AI is revolutionizing the way science is done,” Oleynik stated.
The researchers used the Summit supercomputer, which was created by IBM and is one of the world’s most powerful supercomputers, to test 4,608 IBM Power AC922 servers and 27,900 NVIDIA GPUs. It showed that their algorithm could scale to simulations of 20 billion atoms or more with around 100% efficiency.
Any researcher who wishes to push the boundaries of materials science can use this code.
A Deadly Droplet Inside
Another competitor for the COVID-19 prize used a billion-atom simulation to depict the Delta variation in an airborne droplet (below). It provides the first atomic-level look into aerosols, revealing biological mechanisms that spread COVID and other diseases.
“We show how AI combined with HPC at several levels can result in dramatically better effective performance, opening up new avenues for understanding and interrogating complex biological systems,” Amaro added.
At Summit, the Longhorn supercomputer developed by Dell Technologies for the Texas Advanced Computing Center, and commercial systems under Oracle Cloud Infrastructure, researchers used NVIDIA GPUs (OCI).
“HPC and cloud resources can be leveraged to reduce time-to-solution for big scientific projects, as well as connect researchers and dramatically facilitate complicated collaborative interactions,” the researchers stated.
The Drug Discovery Language
Natural language processing (NLP) was used by finalists for the COVID prize at Oak Ridge National Laboratory (ORNL) to solve the problem of screening chemical compounds for novel medications.
They trained a BERT NLP model that can speed up drug discovery in two hours using a dataset containing 9.6 billion molecules – the largest dataset utilized to this purpose to date. Using a dataset of 1.1 billion molecules, previous best attempts required four days to train a model.
On the Summit supercomputer, more than 24,000 NVIDIA GPUs were used to deliver a whopping 603 petaflops. Now that the model has been trained, it may be run on a single GPU to aid researchers in their hunt for chemical compounds that can prevent COVID and other disorders.
Jens Glaser, a computational scientist at ORNL, said, “We have collaborators here who wish to apply the model to cancer signaling networks.”
“We’re just scratching the surface of training data sizes,” said Andrew Blanchard, a research scientist who led the effort. “We want to use a trillion molecules soon.”
Using a Full-Stack Solution
According to one observer, the team used NVIDIA software libraries for AI and accelerated computing to complete their work in a surprisingly short amount of time.
“We didn’t need to fully optimize our work for the GPU’s Tensor cores because there’s no need for specialized code; you can just use the standard stack,” Glaser explained.
“Having a chance to be a part of meaningful research with the potential to impact people’s lives is very satisfying for a scientist,” he said, expressing the sentiments of many of the finalists.