Proteins within blood plasma serve as biomarkers for cancer and can play an important role in risk assessment, diagnosis, treatment, and drug development.
Researchers typically face a choice in studying the set of all proteins in a patient’s plasma, known as the proteome: either they choose to measure the levels of a small, a priori (determined from reasoning) known subset of proteins or they choose to interrogate the entire proteome. The first task lends itself to relatively inexpensive and quick processes but falls short in the information it provides. Moreover, an optimal subset of proteins to interrogate is typically not known in the first place. The latter approach is substantially more comprehensive in the data it provides but has until now been slow — taking weeks for a single plasma sample — and, as a result, expensive.
A new paper published in Nature Communications by a group of researchers from Brigham and Women’s Hospital, MIT, the Koch Institute, and the life sciences startup Seer describes a new technique for interrogating the proteome. This technique is both relatively comprehensive and, at the same time, fast, taking a few hours. That means it is poised to break the trade-off between comprehensive and fast interrogation of the proteome.
By combining proteomic data made available through this approach with machine learning methods, the researchers’ lung cancer study shows the potential for fast diagnoses through a simple blood draw. This could lead to earlier detection, more effective treatment, and higher survival rates.
“Because the underlying methodology is fast, we can do this quickly and cheaply,” said, a professor of operations management at MIT Sloan and a co-author of the paper. “We can make early detection a reality.”
How it works: a “photograph” of the blood’s proteins
The paper’s authors describe the process of studying proteins in blood plasma as “exceptionally challenging” for multiple reasons. First, the human proteome is estimated to contain more than 20,000 different proteins (the exact number is unknown) and it is a-priori unclear which of these proteins serve as useful biomarkers for a given health condition. Second, the vast majority of these proteins are present in extraordinarily limited quantities — sometimes as little as one-trillionth of a gram per milliliter — relative to more abundant proteins, making measuring the levels of these proteins in plasma challenging. To get a sense of scale, albumin makes up 50% of the mass of protein in plasma with the next 20 most-abundant proteins making up for 49% of the remaining mass.
The new technology can be thought of as having both a hardware and a software component. The ‘hardware’ for the approach (developed by Seer) consists of a set of carefully engineered nanoparticles chosen for their ability to attract a diverse spectrum of proteins. After incubating these nanoparticles in plasma, one is able to analyze the set of proteins that absorb to the surface of these nanoparticles. This entire procedure can be done in hours and yields information on many thousands of proteins in the plasma sample.
“New biology becomes discovered when you look at the totality of the information,” said Dr. Omid Farokhzad, the CEO of Seer, a professor at Harvard Medical School, and an MIT Sloan MBA graduate. “Our technology can selectively survey the proteome in an unbiased and highly reproducible way across the entire range of the proteome — and do that accurately and precisely.”
In addition, a survey of the proteome provides a better sense of what’s happening in the body than a genetic test, said Farokhzad, who also completed post-doctoral studies with MIT professor Robert Langer, a co-author on the research. Genes indicate the risk of developing a condition such as heart disease or Alzheimer’s, he said, but proteins indicate whether a person in fact has the condition.
This new process of interrogating the proteome produces what Farias described as a “proteograph” — a photograph of the proteome — which indicates which proteins were detected on a given nanoparticle. Given our complex physiology, the proteograph is naturally noisy. That’s where the ‘software’ comes in — in this case, a series of newly developed techniques that extract the signal from the noise. Farias compared the computation process to cleaning up a noisy photograph. “If you look at the individual pixels, you never learn anything. You have to look for global patterns to understand the structure,” he said.
The study: improved accuracy and less-invasive testing
Researchers tested the technology — Seer calls its platform Proteograph — in a non-small-cell lung cancer study. Sixty-one participants had early-stage lung cancer, while 80 were at risk of developing cancer but were otherwise healthy.
Typically, each of these patients would have to undergo a lung biopsy — an expensive and invasive procedure — in order to find out if they had cancer, Farias said. In their trial, researchers took a blood draw to see if their approach could distinguish the proteome of those who had cancer with those who didn’t.
The research group completed the analysis in about two weeks, identifying 1,664 proteins across five different nanoparticles. By looking at these proteins, researchers were able to tell which participants had early-stage lung cancer and which ones did not. The study was also able to identify proteins that distinguish someone with early-stage lung cancer from someone with emphysema or asthma, Farokhzad said.
“The classifier we built showed that data from the proteograph could be used to identify the early-stage [non-small-cell lung cancer] cases with high sensitivity and specificity,” Farias said, describing a sensitivity of 55% with a specificity of 99%. This is comparable with what one might achieve with a bronchoscopy — a substantially more invasive test.
Up next: the potential for a proteomic cancer database
The true power of this approach rests in its potential to use a single draw of blood to test simultaneously for multiple cancers. Specifically, the approach has the potential to turbocharge the search for new proteomic biomarkers for all types of cancers. For example, the non-small-cell lung cancer study identified a number of novel candidate biomarkers for lung cancer. As the community of researchers using this technology grows, Farias anticipates that this database will grow in its value and applicability.
“With machine learning and faster computation, we can extract the right amount of signal from the data to say useful things,” Farias added. “In this case, it’s early detection. For something like pancreatic cancer, that makes all the difference.”
The approach could also help researchers assess different risk factors for COVID-19 and begin to explain why some people infected with COVID-19 are asymptomatic while others become deathly sick. While current approaches to testing look for the presence of the virus, a proteomic survey would look at the person who is the host for the virus. “If we could understand how the host is responding to the virus at the proteomic level, both before and after the infection, that could guide us in diagnosing the disease, in identifying who will progress, and in targeting new drugs,” Farokhzad said.