Potential Thesis Topics

Master Thesis topics

Common Workflow Language: Mining user behavior for feature development

Common Workflow Language is a standard for describing command line applications and workflows made from them, typically using (Docker) software containers. CWL is popular in bioinformatics and is gaining traction in other fields such as astrophysics. For flexibility, users are allowed to use Javascript in their CWL tools descriptions to handle complicated command line interfaces, or to generate dynamic configuration files. For maximal language simplicity and ease of parsing/implementation it would be ideal if there was no Javascript allowed in CWL. Therefore the maintainers and contributors to the CWL standard try to monitor how users make use of the Javascript feature and then create new language constructs that fulfill those needs. However the current approach is very manual, infrequent, and non-comprehensive.

Quantitative analysis phase
Using online archives of CWL workflows, and by searching GitHub and GitLab: can we characterize, in an automated fashion, how users use the Javascript Expression feature of CWL? Techniques may include Abstract Syntax Trees, which would benefit from domain specific enhancement.
Design phase
Propose new features for the discovered JS motifs and evaluate the burden of a new language feature versus the utility to users.

Notes:
All work must be done openly and under the Apache 2.0 license. There is a possibility of the results of the project being used for many years to come!

Prerequisites: Comfort with the Linux/Unix command line

May qualify for summer funding via Google's Summer of Code

Contact Michael Crusoe at for further details

 

Common Workflow Language: distributed execution with data streaming

Command line scientific analysis tools often support streaming data into or out of the tool. (At the command line we use the unix pipe “|” or named pipes to implement this). This speeds up the analysis by avoiding slow disk/storage IO.

While the CWL standard supports this approach, no CWL-aware workflow system makes use of this optimization.

You would implement this feature (automatic streaming data in and out of scientific computing tools) to one of the CWL workflow engines, such as Toil (which is Python based).

The first iteration would stream in and out of object stores (Amazon S3, Google Cloud Storage, etc..). More advanced implementations may feature direct streaming between the tools, but this requires refactoring the job scheduling engine.

Notes:
All work must be done openly and under the Apache 2.0 license. If successful, you will have contributed a major feature to a popular workflow engine!

Prerequisites: Python

May qualify for summer funding via Google's Summer of Code

 

Contact Michael Crusoe at  for further details

Cross-architecture Single instruction, multiple data (SIMD) analysis

SIMD intrinsics like SSE SSE2 SSE3 SSSE3 SSE4.1 AVX AVX2 AVX-512 are popular with C/C++ programmers for speeding up analysis code in many research domains, including bioinformatics. Alas they are architecture specific and implementing fallbacks and multi-versioning is tedious. For example, SSE is not available on Raspberry Pi which is popular in education and hobbyist settings.

The SIMD Everywhere header-only C/C++ library reduces this burden by using a variety of methods: 1) GCC or clang extensions 2) OpenMP 3) Cilk Plus 4) pure source implementation and 5) cross-architecture SIMD (e.g. implementing SSE2 with NEON ARM intrinsics).

However, the performance of these implementations are not quantified.

In this project you will benchmark the use of the SIMD Everywhere library and these different backends as used by real scientific computing codebases on a variety of hardware. You will gain experience in SIMD programming and you will improve the SIMD Everywhere Open Source project by adding new SIMD instructions and accelerated implementations.

Notes:
All work must be done openly and under the MIT license. If successful, your work will benefit many scientific and research applications!

Prerequisites: C or C++ experience

May qualify for summer funding via Google's Summer of Code.

 

Contact Michael Crusoe at for further details

Bioinformatic pipelines on Arm64: performance analysis on AWS Graviton and Apple Silicon

The Intel dominance of the scientific computing market may be weakened with the popularity of Arm64 architecture systems like AWS Graviton and Apple Silicon. While there are initial reports of cost and energy savings for industrial applications, this has not been analyzed for bioinformatic pipelines.

 

In this project you compare the performance on a cost, time, and energy basis of multiple real world bioinformatic pipelines on Arm64 systems like those from the AWS cloud and Apple Silicon. You will have the opportunity to assist in porting the parts of bioinformatic codebases that prevent their running on Arm64 processors in conjunction with the Debian Med project. You will also gain experience using software container (docker) build and deployment technologies.

 

Notes:

All work must be done openly and under an approved Free/Open Source Software license. If successful, your work will benefit many scientific and research applications!

May qualify for summer funding via Google's Summer of Code.

Contact Michael Crusoe at for further details

Deep learning for decoy generation in protein identification from mass spectrometry

Prequisites: Knowledge of bioinformatics and of TensorFlow

 

Notes:

All work must be done openly and under an approved Free/Open Source Software license. If successful, your work will benefit many scientific and research applications!

Contact Tjeerd Dijkstra at for further details

Antimicrobial resistant bacteria; predicting antimicrobial resistance; machine learning; sequence bioinformatics

Prequisites: Some R or Python; some knowledge of machine learning and sequence bioinformatics would be beneficial

 

Notes:

All work must be done openly and under an approved Free/Open Source Software license. If successful, your work will benefit many scientific and research applications!

Contact Tjeerd Dijkstra at and Thomas Hamm at for further details

Mass spectrometry, nextflow, nf-core, MSstatsTMT, OpenMS

Prequisites:

 

Notes:

All work must be done openly and under an approved Free/Open Source Software license. If successful, your work will benefit many scientific and research applications!

Contact TImo Saschenberg at for further details

Mass spectrometry, C++, hdf5, mzMLb, OpenMS;

Prequisites:

 

Notes:

All work must be done openly and under an approved Free/Open Source Software license. If successful, your work will benefit many scientific and research applications!

Contact TImo Saschenberg at for further details

Mass spectrometry, C++, OpenMS

Prequisites:

 

Notes:

All work must be done openly and under an approved Free/Open Source Software license. If successful, your work will benefit many scientific and research applications!

Contact TImo Saschenberg at for further details

Mass spectrometry, C++, peptide identification, OpenMS

Prequisites:

 

Notes:

All work must be done openly and under an approved Free/Open Source Software license. If successful, your work will benefit many scientific and research applications!

Contact TImo Saschenberg at for further details

Bachelor Thesis topics

Integrated processing workflow for mass spectrometry data

Prequisites:

 

Notes:

All work must be done openly and under an approved Free/Open Source Software license. If successful, your work will benefit many scientific and research applications!

Contact TImo Saschenberg at for further details

Efficient processing of mass spectrometry raw data

Prequisites:

 

Notes:

All work must be done openly and under an approved Free/Open Source Software license. If successful, your work will benefit many scientific and research applications!

Contact TImo Saschenberg at for further details

Improved feature linking in large-scale mass spectrometry experiments

Prequisites:

 

Notes:

All work must be done openly and under an approved Free/Open Source Software license. If successful, your work will benefit many scientific and research applications!

Contact TImo Saschenberg at for further details

Quality control for quantification in large-scale mass spectrometry experiments

Prequisites:

 

Notes:

All work must be done openly and under an approved Free/Open Source Software license. If successful, your work will benefit many scientific and research applications!

Contact TImo Saschenberg at for further details

You hereby assure to have read and agree to our GDPR Disclaimer (Datenschutzerklärung).

GDPR Disclaimer

The cookie settings on this website are set to "allow cookies" to give you the best browsing experience possible. If you continue to use this website without changing your cookie settings or you click "Accept" below then you are consenting to this.

Close