Skip to content

Publications

Statistical analysis of feature-based molecular networking results from non-targeted metabolomics data

Author(s)
Abzer K. Pakkir Shah
- Axel. Walter
- Filip. Ottosson
- Francesco. Russo
- Marcelo. Navarro-Diaz
- Judith. Boldt
- Jarmo-Charles J. Kalinski
- Eftychia Eva. Kontou
- James. Elofson
- Alexandros. Polyzois
- . others
Abstract

Feature-based molecular networking (FBMN) is a popular analysis approach for liquid chromatography–tandem mass spectrometry-based non-targeted metabolomics data. While processing liquid chromatography–tandem mass spectrometry data through FBMN is fairly streamlined, downstream data handling and statistical interrogation are often a key bottleneck. Especially users new to statistical analysis struggle to effectively handle and analyze complex data matrices. Here we provide a comprehensive guide for the statistical analysis of FBMN results, focusing on the downstream analysis of the FBMN output table. We explain the data structure and principles of data cleanup and normalization, as well as uni- and multivariate statistical analysis of FBMN results. We provide explanations and code in two scripting languages (R and Python) as well as the QIIME2 framework for all protocol steps, from data clean-up to statistical analysis. All code is shared in the form of Jupyter Notebooks (https://github.com/Functional-Metabolomics-Lab/FBMN-STATS). Additionally, the protocol is accompanied by a web application with a graphical user interface (https://fbmn-statsguide.gnps2.org/) to lower the barrier of entry for new users and for educational purposes. Finally, we also show users how to integrate their statistical results into the molecular network using the Cytoscape visualization tool. Throughout the protocol, we use a previously published environmental metabolomics dataset for demonstration purposes. Together, the protocol, code and web application provide a complete guide and toolbox for FBMN data integration, cleanup and advanced statistical analysis, enabling new users to uncover molecular insights from their non-targeted metabolomics data. Our protocol is tailored for the seamless analysis of FBMN results from Global Natural Products Social Molecular Networking and can be easily adapted to other mass spectrometry feature detection, annotation and networking tools.

Nature protocols, pp. 1–71, 2024

Author(s)
S. Welten
- M. Arruda Botelho Herr
- L. Hempel
- D. Hieber
- P. Placzek
- M. Graf
- S. Weber
- L. Neumann
- M. Jugl
- L. Tirpitz
- K. Kindermann
- S. Geisler
- L O Bonino. Silva Santos
- S. Decker
- N. Pfeifer
- O. Kohlbacher
- T. Kirsten
Abstract

The development of platforms for distributed analytics has been driven by a growing need to comply with various governance-related or legal constraints. Among these platforms, the so-called Personal Health Train (PHT) is one representative that has emerged over the recent years. However, in projects that require data from sites featuring different PHT infrastructures, institutions are facing challenges emerging from the combination of multiple PHT ecosystems, including data governance, regulatory compliance, or the modification of existing workflows. In these scenarios, the interoperability of the platforms is preferable. In this work, we introduce a conceptual framework for the technical interoperability of the PHT covering five essential requirements: Data integration, unified station identifiers, mutual metadata, aligned security protocols, and business logic. We evaluated our concept in a feasibility study that involves two distinct PHT infrastructures: PHT-meDIC and PADME. We analyzed data on leukodystrophy from patients in the University Hospitals of Tübingen and Leipzig, and patients with differential diagnoses at the University Hospital Aachen. The results of our study demonstrate the technical interoperability between these two PHT infrastructures, allowing researchers to perform analyses across the participating institutions. Our method is more space-efficient compared to the multi-homing strategy, and it shows only a minimal time overhead.

Sci Data, 11 (1), pp. 663, 2024

Author(s)
R. Barquera
- O. vez
- K. gele
- P. rez-Ramallo
- D I. ndez-Zaragoza
- A. Szolek
- A B. Rohrlach
- P. Librado
- A. Childebayeva
- R A. Bianco
- B S. Penman
- V. a-Alonzo
- M. Lucas
- J C. Lara-Riegos
- M E. Moo-Mezeta
- J C. Torres-Romero
- P. Roberts
- O. Kohlbacher
- C. Warinner
- J. Krause
Abstract

The ancient city of Chichén Itzá in Yucatán, Mexico, was one of the largest and most influential Maya settlements during the Late and Terminal Classic periods (ad 600–1000) and it remains one of the most intensively studied archaeological sites in Mesoamerica1–4. However, many questions about the social and cultural use of its ceremonial spaces, as well as its population’s genetic ties to other Mesoamerican groups, remain unanswered2. Here we present genome-wide data obtained from 64 subadult individuals dating to around ad 500–900 that were found in a subterranean mass burial near the Sacred Cenote (sinkhole) in the ceremonial centre of Chichén Itzá. Genetic analyses showed that all analysed individuals were male and several individuals were closely related, including two pairs of monozygotic twins. Twins feature prominently in Mayan and broader Mesoamerican mythology, where they embody qualities of duality among deities and heroes5, but until now they had not been identified in ancient Mayan mortuary contexts. Genetic comparison to present-day people in the region shows genetic continuity with the ancient inhabitants of Chichén Itzá, except at certain genetic loci related to human immunity, including the human leukocyte antigen complex, suggesting signals of adaptation due to infectious diseases introduced to the region during the colonial period.

Nature, 2024

Author(s)
A. Bayas
- U. Mansmann
- B I. n
- V S. Hoffmann
- A. Berthele
- M. hlau
- M C. Kowarik
- M. Krumbholz
- M. Senel
- V. Steuerwald
- M. Naumann
- J. Hartberger
- M. Kerschensteiner
- E. Oswald
- C. Ruschil
- U. Ziemann
- H. Tumani
- I. Vardakas
- F. Albashiti
- F. Kramer
- I. Soto-Rey
- H. Spengler
- G. Mayer
- H A. Kestler
- O. Kohlbacher
- M. Hagedorn
- M. Boeker
- K. Kuhn
- S. Buchka
- F. Kohlmayer
- J S. Kirschke
- L. Behrens
- H. Zimmermann
- B. Bender
- N. Sollmann
- J. Havla
- B. Hemmer
- A. Berlis
- B. Wiestler
- T. mpfel
- K. Seelos
- J. nschede
- R. Kemmner
- M. Beer
- J. Dietrich
- J. Schaller
Abstract

characteristics and (bio)markers that reliably predict the individual disease prognosis at disease onset are lacking. Cohort studies allow a close follow-up of MS histories and a thorough phenotyping of patients. Therefore, a multicenter cohort study was initiated to implement a wide spectrum of data and (bio)markers in newly diagnosed patients. months. Further objectives are refining the MS-TDS score and providing data to identify new markers reflecting disease course and severity. The project also provides a technical evaluation of the ProVal-MS cohort within the IT-infrastructure of the DIFUTURE consortium (Data Integration for Future Medicine) and assesses the efficacy of the data sharing techniques developed. Clinical cohorts provide the infrastructure to discover and to validate relevant disease-specific findings. A successful validation of the MS-TDS will add a new clinical decision tool to the armamentarium of practicing MS neurologists from which newly diagnosed MS patients may take advantage. Trial registration ProVal-MS has been registered in the German Clinical Trials Register, `Deutsches Register Klinischer Studien` (DRKS)-ID: DRKS00014034, date of registration: 21 December 2018; https://drks.de/search/en/trial/DRKS00014034.

Neurol Res Pract, 6 (1), pp. 15, 2024

Author(s)
J. Pfeuffer
- C. Bielow
- S. Wein
- K. Jeong
- E. Netz
- A. Walter
- O. Alka
- L. Nilse
- P D. Colaianni
- D. McCloskey
- J. Kim
- G. Rosenberger
- L. Bichmann
- M. Walzer
- J. Veit
- B. Boudaud
- M. Bernt
- N. Patikas
- M. Pilz
- M P. Startek
- S. Kutuzova
- L. Heumos
- J. Charkow
- J C. Sing
- A. Feroz
- A. Siraj
- H. Weisser
- T M H. Dijkstra
- Y. Perez-Riverol
- H. st
- O. Kohlbacher
- T. Sachsenberg
Nat Methods, 2024

Author(s)
K. Jeong
- P T. Kaulich
- W. Jung
- J. Kim
- A. Tholey
- O. Kohlbacher
Abstract

Top-down proteomics (TDP) directly analyzes intact proteins and thus provides more comprehensive qualitative and quantitative proteoform-level information than conventional bottom-up proteomics (BUP) that relies on digested peptides and protein inference. While significant advancements have been made in TDP in sample preparation, separation, instrumentation, and data analysis, reliable and reproducible data analysis still remains one of the major bottlenecks in TDP. A key step for robust data analysis is the establishment of an objective estimation of proteoform-level false discovery rate (FDR) in proteoform identification. The most widely used FDR estimation scheme is based on the target-decoy approach (TDA), which has primarily been established for BUP. We present evidence that the TDA-based FDR estimation may not work at the proteoform-level due to an overlooked factor, namely the erroneous deconvolution of precursor masses, which leads to incorrect FDR estimation. We argue that the conventional TDA-based FDR in proteoform identification is in fact protein-level FDR rather than proteoform-level FDR unless precursor deconvolution error rate is taken into account. To address this issue, we propose a formula to correct for proteoform-level FDR bias by combining TDA-based FDR and precursor deconvolution error rate.

Proteomics, 24 (3-4), pp. e2300068, 2024

Author(s)
A L. Illert
- A. Stenzinger
- M. Bitzer
- P. Horak
- V I. Gaidzik
- Y. ller
- J. Beha
- Ö. ner
- F. Schmitt
- S. mann
- S. Ossowski
- C P. Schaaf
- M. Hallek
- T H. mmendorf
- P. Albers
- T. Fehm
- P. Brossart
- H. Glimm
- D. Schadendorf
- A. Bleckmann
- C H. Brandts
- I. Esposito
- E. Mack
- C. Peters
- C. Bokemeyer
- S. hling
- T. Kindler
- H. l
- V. Heinemann
- H. hner
- R. Bargou
- V. Ellenrieder
- P. Hillemanns
- F. Lordick
- A. Hochhaus
- M W. Beckmann
- T. Pukrop
- M. Trepel
- L. Sundmacher
- S. Wesselmann
- G. Nettekoven
- F. Kohlhuber
- O. Heinze
- J. Budczies
- M. Werner
- K. Nikolaou
- A J. Beer
- G. Tabatabai
- W. Weichert
- U. Keilholz
- M. Boerries
- O. Kohlbacher
- J. Duyster
- R. Thimme
- T. Seufferlein
- P. Schirmacher
- N P. Malek
Nat Med, 2023

Author(s)
E E. Kontou
- A. Walter
- O. Alka
- J. Pfeuffer
- T. Sachsenberg
- O S. Mohite
- M. Nuhamunada
- O. Kohlbacher
- T. Weber
Abstract

Metabolomics experiments generate highly complex datasets, which are time and work-intensive, sometimes even error-prone if inspected manually. Therefore, new methods for automated, fast, reproducible, and accurate data processing and dereplication are required. Here, we present UmetaFlow, a computational workflow for untargeted metabolomics that combines algorithms for data pre-processing, spectral matching, molecular formula and structural predictions, and an integration to the GNPS workflows Feature-Based Molecular Networking and Ion Identity Molecular Networking for downstream analysis. UmetaFlow is implemented as a Snakemake workflow, making it easy to use, scalable, and reproducible. For more interactive computing, visualization, as well as development, the workflow is also implemented in Jupyter notebooks using the Python programming language and a set of Python bindings to the OpenMS algorithms (pyOpenMS). Finally, UmetaFlow is also offered as a web-based Graphical User Interface for parameter optimization and processing of smaller-sized datasets. UmetaFlow was validated with in-house LC–MS/MS datasets of actinomycetes producing known secondary metabolites, as well as commercial standards, and it detected all expected features and accurately annotated 76% of the molecular formulas and 65% of the structures. As a more generic validation, the publicly available MTBLS733 and MTBLS736 datasets were used for benchmarking, and UmetaFlow detected more than 90% of all ground truth features and performed exceptionally well in quantification and discriminating marker selection. We anticipate that UmetaFlow will provide a useful platform for the interpretation of large metabolomics datasets.

J Cheminform, 15 (1), pp. 52, 2023

Author(s)
Mirjam. Figaschewski
- Bilge. Sürü
- Thorsten. Tiede
- Oliver. Kohlbacher
Abstract

Background: Personalized oncology represents a shift in cancer treatment from conventional methods to target specific therapies where the decisions are made based on the patient specific tumor profile. Selection of the optimal therapy relies on a complex interdisciplinary analysis and interpretation of these variants by experts in molecular tumor boards. With up to hundreds of somatic variants identified in a tumor, this process requires visual analytics tools to guide and accelerate the annotation process. Results: The Personal Cancer Network Explorer (PeCaX) is a visual analytics tool supporting the efficient annotation, navigation, and interpretation of somatic genomic variants through functional annotation, drug target annotation, and visual interpretation within the context of biological networks. Starting with somatic variants in a VCF file, PeCaX enables users to explore these variants through a web-based graphical user interface. The most protruding feature of PeCaX is the combination of clinical variant annotation and gene-drug networks with an interactive visualization. This reduces the time and effort the user needs to invest to get to a treatment suggestion and helps to generate new hypotheses. PeCaX is being provided as a platform-independent containerized software package for local or institution-wide deployment. PeCaX is available for download at https://github.com/KohlbacherLab/PeCaX-docker . Keywords: Clinical decision support; Gene drug interaction networks; Personalized oncology; Precision medicine.

BMC Bioinformatics, 24 (1), pp. 88, 2023

Author(s)
L. Mühlenbruch
- T. Abou-Kors
- M L. Dubbelaar
- L. Bichmann
- O. Kohlbacher
- M. Bens
- J. Thomas
- J. Ezić
- J M. Kraus
- H A. Kestler
- A. Witzleben
- J. Mytilineos
- D. Fürst
- D. Engelhardt
- J. Doescher
- J. Greve
- P J. Schuler
- M N. Theodoraki
- C. Brunner
- T K. Hoffmann
- H G. Rammensee
- J S. Walz
- S. Laban
Abstract

The immune peptidome of OPSCC has not previously been studied. Cancer-antigen specific vaccination may improve clinical outcome and efficacy of immune checkpoint inhibitors such as PD1/PD-L1 antibodies. 40) using immunoaffinity purification. The cohort included 22 HPV-positive (primarily HPV-16) and 18 HPV-negative samples. A benign reference dataset comprised of the HLA ligandomes of benign haematological and tissue datasets was used to identify tumour-associated antigens. MS analysis led to the identification of naturally HLA-presented peptides in OPSCC tumour tissue. In total, 22,769 peptides from 9485 source proteins were detected on HLA class I. For HLA class II, 15,203 peptides from 4634 source proteins were discovered. By comparative profiling against the benign HLA ligandomic datasets, 29 OPSCC-associated HLA class I ligands covering 11 different HLA allotypes and nine HLA class II ligands were selected to create a peptide warehouse. Tumour-associated peptides are HLA-presented on the cell surfaces of OPSCCs. The established warehouse of OPSCC-associated peptides can be used for downstream immunogenicity testing and peptide-based immunotherapy in (semi)personalised strategies.

Br J Cancer, pp. 1–11, 2023