Project R-8830

Title

A universal search engine for mass spectrometry-based sequential elucidation (Research)

Abstract

Typical mass spectrometry-based proteomics experimentation for biomarker discovery results in gigabytes of data that contains information about amino acid sequences related to potential protein markers. Despite ever increasing computational power, modern database search algorithms processing such data are only able to convert a fraction of the fragment spectra into meaningful information. The reason is that database search engines make use of a series of assumptions to restrict the search space and keep computational resources under control. The scientific community agrees that more effort has to be put into the development of a universal database search engine that is robust, easy-to-use and free of search space assumptions. We propose a conceptual framework for mass spectrometry- based proteomics with the premise of unlimited computing power. This allows for revolutionary thinking in the algorithmic design of the method. Moving away from classical peptide/protein identification and towards the partial mapping of spectral peaks to protein hotspots, the approach enables scalable data analysis of high-throughput mass spectrometry data regardless of spectral purity, precursor value, acquisition type, fragmentation mechanism or sample preparation (incl. details of enzymatic digestion). The approach can operate on high-performance computer clusters and allows for novel modes of data acquisition which radically simplify the generation and analysis of quantitative proteomics data.

Period of project

01 April 2017 - 31 March 2021