Deep learning used to recognize cancerous molecular patterns

An artificial intelligence platform can now analyze genomic data extremely fast. It detects key patterns that now can contribute to a reclassification of colorectal cancer and improve new drugs' R&D.

The technique allows classifying genomic data quickly and precisely

An artificial intelligence platform developed at the Max-Delbrück Center for Molecular Medicine (MDC) can analyze genomic data extremely fast. It detects key patterns in order to classify colorectal cancer and improve the development of drugs. Some types of colorectal cancer, therefore, need to be reclassified.

A new deep-learning algorithm can quickly and accurately analyze different types of genomic data obtained from colorectal carcinomas and thus classify them more accurately. The researchers, who published their findings in the Life Science Alliance Journal, reported that this could improve diagnosis and related treatment options.

"Most diseases are much more complex than a single gene," says Dr. Altuna Akalin, head of the bioinformatics research group at the Berlin Institute for Medical Systems Biology (BIMSB), part of the Max Delbrück Center for Molecular Medicine (MDC). "To grasp this complexity, we need some kind of machine learning that can really process all the data” he adds.

The challenge laid in being able to analyze the numerous features present in genetic material, including gene expression, point mutations, and structural changes in which a DNA segment is generated multiple times (i.e. CNVs or copy number variations). Akalin and his Ph.D. student Jonathan Ronen, therefore, designed the "Multi-omics Autoencoder Integration" platform, or "MAUI" for short. As a deep learning platform, MAUI is able to analyze multiple omics data sets and identify the most important patterns or characteristics, and in this case, gene sets or indicators for colorectal cancer.

Re-classifying subtypes?

The MAUI platform identified patterns in the data that corresponded to the four known subtypes of colorectal carcinomas and tumors to these subtypes with high precision. It made another interesting discovery: It found a pattern that suggests that a tumor subtype (CMS2) may need to be divided into two different groups. The tumors have different mechanisms and survival rates. 

The team proposes further research to determine whether the subtype is unique or generally characteristic for tumor spread. In either case, the result shows what the platform is capable of: it can take into account not only the known genes already associated with the disease but also all other data, thus providing deeper insights.

"Using data science methods, insights can also be gained from complex data that are normally difficult to interpret," says Akalin. "You can feed algorithms with all the data available on tumors, and they will find meaningful patterns".

Faster and better

Not only was the program more accurate; it also worked faster than other machine learning algorithms, taking only three minutes to filter out 100 patterns. Other programs took from 20 minutes to even eleven hours.

“The program is able to learn a higher number of latent factors in a fraction of the computing time" explains Jonathan Ronen, lead author of the study.

The team was surprised at how fast the system works, especially because the researchers did not use graphics cards, which normally speed up the calculations. This shows how extremely well optimized and efficient the algorithm already is, even though the team continues to work on improving the system.

Improving drug development

In order to investigate the effect of potential drugs, the team slightly adapted the program: It can now also analyze cell lines that have been removed from tumors or cultivated in the laboratory. On the molecular level, however, cell lines differ from real tumors in many ways. In order to estimate the extent of the differences, the team used MAUI on the cell lines, which are currently being used to test active ingredients against colorectal cancer, to compare them with cells from real tumors. Almost half of the cell lines were more closely related to other cell lines than to real tumors. Only a handful of lines are most similar to the different types of colorectal carcinomas.

Although the search for new drugs does not rely solely on cell lines, this finding could help to better develop the full potential of cell line research. It might also be possible to adapt it to other types of drug testing based on genetic information.

A “Google” for tumors

Once the deep learning platform for colorectal cancer has been extensively tested, it could also be used to analyze data from new patients. "You can think of it as a search engine," says Akalin.

Physicians could feed a patient's genetic data into MAUI to find the best match and thus classify the tumor quickly and accurately. The platform could then recommend drugs that have worked well for similar tumors. It could then help to predict whether a particular therapy is beneficial and what the survival rates are.

Currently, this is only possible in an academic setting and if the physicians had previously tried all available clinical protocols. There is still a long way to go to get a test or system approved for clinical use, says Akalin. With the support of the Digital Health Accelerator Program of the Berlin Institute of Health, the team is weighing up the potential for marketing the system. They are also further developing MAUI for its application to other types of cancer.

Jonathan Ronen et al (2019): "Evaluation of colorectal cancer subtypes and cell lines using deep learning", Life Science Alliance, DOI: 10.26508/lsa.201900517