The translation potential of short open reading frames (sORFs), and the biological relevance of the translation products, has historically been undervalued, but there is a growing appreciate for the potential importance of peptides generated from these regions. In general, studying these small peptides by LC-MS/MS is complicated by low target abundance, technically challenging peptides and spectra, and large search spaces. However, the use of highly sensitive MS methods, such as ion mobility spectrometry (IMS), increases in ribosome profiling data and annotation tools, and the advancement of machine learning has improved the ability to identify high quality peptides from putative non-coding regions. In this study by Peeters et al., researchers describe a workflow for investigating neuropeptides from mice using Bruker’s timsTOF Pro with PASEF technology using database searches with a custom proteogenomic database. Database searches were completed with the MSGF+ search tool, and then with PEAKS Online; the latter uniquely identifying 204 precursor-derived neuropeptides in the mouse brain. In total, the combined methods in the workflow provided MS evidence for 84 sORF-encoded peptides from putative non-coding regions showing the power of top-down IMS and post-processing machine-learning for finding these challenging targets.
How was PEAKS used?
The raw data from timsTOF was searched with both MSGF+ and PEAKS Online. A custom proteogenomics database was created by combing the mouse reference proteome from UniProt with an alternative proteome, the cRAP database, and reverse sequences to serve as decoys. The alternative proteome was generated by cross referencing sORF predictions from two sources, publicly available ribosome profiling datasets analyzed with sORFs.org and sORF data from the OpenProt proteogenomic resource.
Peeters MKR, Baggerman G, Gabriels R, Pepermans E, Menschaert G, Boonen K. Ion Mobility Coupled to a Time-of-Flight Mass Analyzer Combined With Fragment Intensity Predictions Improves Identification of Classical Bioactive Peptides and Small Open Reading Frame-Encoded Peptides. Front Cell Dev Biol. 2021 Sep 17;9:720570. doi:10.3389/fcell.2021.720570. PMID: 34604223; PMCID: PMC8484717.
Abstract
Bioactive peptides exhibit key roles in a wide variety of complex processes, such as regulation of body weight, learning, aging, and innate immune response. Next to the classical bioactive peptides, emerging from larger precursor proteins by specific proteolytic processing, a new class of peptides originating from small open reading frames (sORFs) have been recognized as important biological regulators. But their intrinsic properties, specific expression pattern and location on presumed non-coding regions have hindered the full characterization of the repertoire of bioactive peptides, despite their predominant role in various pathways. Although the development of peptidomics has offered the opportunity to study these peptides in vivo, it remains challenging to identify the full peptidome as the lack of cleavage enzyme specification and large search space complicates conventional database search approaches. In this study, we introduce a proteogenomics methodology using a new type of mass spectrometry instrument and the implementation of machine learning tools toward improved identification of potential bioactive peptides in the mouse brain. The application of trapped ion mobility spectrometry (tims) coupled to a time-of-flight mass analyzer (TOF) offers improved sensitivity, an enhanced peptide coverage, reduction in chemical noise and the reduced occurrence of chimeric spectra. Subsequent machine learning tools MS2PIP, predicting fragment ion intensities and DeepLC, predicting retention times, improve the database searching based on a large and comprehensive custom database containing both sORFs and alternative ORFs. Finally, the identification of peptides is further enhanced by applying the post-processing semi-supervised learning tool Percolator. Applying this workflow, the first peptidomics workflow combined with spectral intensity and retention time predictions, we identified a total of 167 predicted sORF-encoded peptides, of which 48 originating from presumed non-coding locations, next to 401 peptides from known neuropeptide precursors, linked to 66 annotated bioactive neuropeptides from within 22 different families. Additional PEAKS analysis expanded the pool of SEPs on presumed non-coding locations to 84, while an additional 204 peptides completed the list of peptides from neuropeptide precursors. Altogether, this study provides insights into a new robust pipeline that fuses technological advancements from different fields ensuring an improved coverage of the neuropeptidome in the mouse brain.