Challenge #1

„A molecule in a haystack: Find drug targets in cancer omics data“


21st century biology developed large-scale methods for genomics, epigenomics, proteomics and generates vast amounts of data. This includes DNA sequences, gene mutations, epigenetic modifications, gene expression, post-transcriptional regulation, protein levels, drug-protein-interactions, clinical parameters and much more. But while we generate lots of data, we lack methods to efficiently store, manage and analyze them.

We need better solutions to robustly combine all the knowledge from molecular omics data to clinically relevant covariates. In this challenge, we try to identify new drug targets by integrating public data sets of cancer related omics experiments.


Research projects such as TCGA and ENCODE produce huge amounts of omics data for all kinds of biological samples and diseases. These data sets are very heterogenous and span all the different levels of cellular activity. They are generated in different experiments and measures things on different scales. Currently, data integration is tedious and requires a lot of manual work and expert knowledge.
Before we can do anything with the data we need to get an overview, see the connections and understand the relevant biological questions. For that we have to clean, structure and integrate everything and enrich it with prior knowledge. This enables the first steps in data analysis, such as identification of relevant genes and disease specific regulation. To facilitate this, we will develop new ways to store data and generate noSQL database models. Options are Elasticsearch, mongoDB, Cassandra or neo4j.

Actual Challenge

1. Develop a data model that can represent all the different data types
2. Get data sets for a couple of samples from TCGA/ENCODE and drug targeting data
3. Find genes that are relevant in specific cancer types
4. Identify molecules targeting these genes
5. Develop an API to allow flexible access to the datasets for e.g. predictive analytics and machine learning

Can you find a solution to this challenge?
Challenge owner: Martin Preusse (Neo4j)


Um unsere Webseite für Sie optimal zu gestalten und fortlaufend verbessern zu können, verwenden wir Cookies. Durch die weitere Nutzung der Webseite stimmen Sie der Verwendung von Cookies zu. Weitere Informationen zu Cookies erhalten Sie in unserer Datenschutzerklärung

Die Cookie-Einstellungen auf dieser Website sind auf "Cookies zulassen" eingestellt, um das beste Surferlebnis zu ermöglichen. Wenn du diese Website ohne Änderung der Cookie-Einstellungen verwendest oder auf "Akzeptieren" klickst, erklärst du sich damit einverstanden.