Abstract:
Life science has a wide range of available technologies for gathering and analysing qualitative and quantitative insights into biological systems. These can enable the development of new clinical applications with novel combined approaches, for example to personalised cancer immunotherapy. Among these technologies, mass spectrometry is a complex but versatile analysis method for proteomics and beyond. It is also a prolific source of experimental data in these fields. Improved instrumentation, technological advances, and high-throughput experiments result in a substantial growth in data volume calling for automation in analysis and quality control. Combination with other technologies is essential for advances in clinical applications, like the development of personalised cancer immunotherapies, inherently multidisciplinary. Here we describe how integration of data generation and analysis tools can have synergistic effects, unlock novel analysis designs, and overall create a more comprehensive picture of the underlying biology. First, we describe the development of standard file formats for proteomics and their role in data integration. Many experimental proteomics techniques share core aspects that are reflected in their data and the analysis steps necessary for successful interpretation. The formats cover mass spectrometry-based proteomics identification and quantification data (mzIdentML, mzTab), limited support for small molecules (mzQuantML, mzTab), and QC data from both acquisition and analysis (qcML). In the second part of this work, we describe the integration of analysis tool frameworks (OpenMS, FRED2) and their accompanying analysis tools in workflow orchestration solutions (KNIME). Compatible tools (with standard format in- and output) allow us to create complete data analysis workflows with data from multiple domains (here genomics, transcriptomics, proteomics, and HLA peptidomics) and explore the benefits of automated analysis with a closer look at the utility of add-on workflows for quality control in parallel to a main analysis. The final part of the thesis concludes this work with the application of a combined workflow in the research and development of personalised immunotherapies against cancer. We show how our workflow is comparably sensitive in detecting neoepitopes applied to a melanoma dataset with previously published detected neoepitopes. We also explore how cancer type can influence the prospect of detecting actionable immunotherapy targets with the example dataset from hepatocellular carcinoma patients.