Photo from SonTech Research Team.
Drug manufacturers have traditionally used a time-consuming trial-and-error approach to discover potential medications for combating diseases. However, what if artificial intelligence could forecast the composition of a novel drug molecule in a manner similar to how Google determines your search queries or email programs predict your responses, such as "Understood, thank you"?
The goal of a novel approach involves utilizing a form of artificial intelligence called natural language processing, which is the same technology employed by OpenAI's ChatGPT to generate human-like responses. This approach is applied to the analysis and creation of proteins, which are fundamental components of life and play a crucial role in the development of various drugs. The method capitalizes on the similarity between biological codes and search queries or email texts, as they all consist of sequences of letters.
Proteins consist of numerous small chemical building blocks called amino acids, and scientists employ specific notation to record these sequences. Using a system where each amino acid is symbolized by a letter of the alphabet, proteins are depicted as lengthy combinations resembling sentences.
Natural language algorithms, which rapidly analyze language and anticipate the next part of a conversation, can also be employed in the realm of biological data to generate protein-language models. These models capture the equivalent of the structural rules governing proteins – determining which combinations of amino acids result in particular therapeutic properties. This enables the prediction of letter sequences that have the potential to serve as the foundation for new drug molecules. Consequently, the duration needed for the initial phases of drug discovery might decrease significantly, shifting from years to months.
Ali Madani, the founder of ProFluent Bio, a startup located in Berkeley, California, specializing in language-based protein design, points out that nature offers numerous instances of intricately crafted proteins with diverse functions. He emphasizes that we are gaining insights from nature's own designs as a blueprint for our work.
Protein-based medications are employed in the treatment of various conditions, including heart disease, specific types of cancer, and HIV. Over the past couple of years, companies like Merck & Co., Roche Holding AG's Genentech, along with several startups such as Helixon Ltd. and Ainnocence, have embarked on the exploration of new drugs using natural language processing. Their aspiration is that this approach will not only enhance the efficacy of current drugs and potential candidates but also introduce the possibility of discovering entirely new molecules. These molecules could potentially address diseases like pancreatic cancer or ALS, for which more effective treatments have been elusive.
Sean McClain, the founder and CEO of Absci Corp, a drug discovery company located in Vancouver, Washington, anticipates that technologies like these will begin to tackle aspects of biology that were previously considered challenging or impossible to develop drugs for.
Computational biologists assert that there are significant challenges ahead for the application of natural language processing in drug discovery. They caution that making extensive modifications to existing protein-based drugs might lead to unforeseen adverse reactions. Additionally, thoroughly testing the safety of entirely synthetic molecules within the human body will be necessary.
If the natural-language algorithms prove successful as expected by those who have adopted them, they could significantly enhance the potential of artificial intelligence to revolutionize the field of drug discovery. Previous endeavors to harness AI faced challenges related to technological limitations or data scarcity. Advocates argue that recent advancements in natural language processing, coupled with a substantial reduction in the cost of protein sequencing that has resulted in extensive databases of amino acid sequences, have largely addressed both of these issues.
As the technology is currently in its initial phases, companies are presently concentrating on leveraging protein-language models to enhance familiar molecules. Their primary objective is to enhance the effectiveness of drug candidates. For instance, starting with a naturally occurring monoclonal antibody, these models can suggest adjustments to its amino acid sequence to enhance its therapeutic properties.
In an unpublished research paper made available online in August, scientists at Absci employed this approach to improve the antibody-based cancer medication trastuzumab. They modified it in a way that it binds more strongly to its designated target on the surface of cancer cells. A stronger binding could potentially allow patients to receive the same benefits from a lower dose, potentially leading to shorter treatment durations and fewer side effects.
In a different study that appeared in the March issue of the Proceedings of the National Academy of Sciences, a team of researchers from MIT, Tsinghua University, and the Beijing-based company Helixon utilized protein-language models to modify a potential Covid-19 drug. Originally, this drug was only effective against the alpha, beta, and gamma variants of the virus. However, through this process, they adapted it to be effective against the delta variant as well.
Ainnocence, a startup with operations in both the United States and China, assists its clients in employing these models to alter animal proteins, including antibodies from rabbits, which are frequently used as a foundation in drug discovery. The aim is to adapt these proteins to be compatible with human physiology. This information comes from the company's founder and CEO, Lurong Pan.
However, pharmaceutical companies are already looking beyond simply altering existing proteins and are exploring a concept known as "de novo design." This process involves creating molecules entirely from the ground up.
Genentech conducted an experiment demonstrating the feasibility of designing an antibody that could attach to the same cellular target as pertuzumab, an existing breast cancer drug marketed by Genentech as Perjeta. What makes this significant is that the newly designed antibody has a completely different amino acid sequence. In this experiment, Genentech's protein-language models were provided with only the target information and the desired three-dimensional structure of the antibody. This approach emphasizes the critical role of the protein's shape in determining its function. Richard Bonneau, an executive director at Genentech who joined the company after it acquired his startup, Prescient Design, shared this information.
Absci and Helixon are collaborating with pharmaceutical companies to create medications for cancer and autoimmune diseases through de novo approaches. In January, Absci disclosed a partnership with Merck to pursue three specific drug targets, as mentioned by Mr. McClain. A spokesperson from Merck confirmed the company's involvement in various collaborations aimed at harnessing artificial intelligence in drug development. Helixon, on the other hand, recently established partnerships with two prominent pharmaceutical firms to address diseases that were previously considered difficult to develop treatments for, according to CEO and founder Jian Peng.
Dr. Pan from Ainnocence suggests that the persistent challenges in drug discovery have remained unresolved for quite some time, awaiting a new wave of technology to tackle them. He believes that this approach represents a truly groundbreaking shift in methodology.
In the end, numerous computational biologists anticipate that protein-language models will not only expedite drug development but also offer advantages in various other domains. They believe that the same method could be employed to create improved enzymes for breaking down plastics, purifying wastewater, and addressing environmental concerns like oil spills, among other potential environmental applications.
Dr. Madani from ProFluent Bio emphasizes the pivotal role of proteins in various life processes. Proteins play a crucial role in functions like respiration and vision, contribute to the sustainability of the environment, and are essential for human health and addressing diseases. He suggests that if we can enhance existing proteins or create entirely new ones, it could have far-reaching implications and applications across a wide range of fields.