Main Page

From Wings-drugome
Jump to: navigation, search

Welcome to the Wings Drugome Workflow Wiki

Contents

Project Overview

The goal of this work is to extend articles with scientific workflows to 1) represent computations carried out to obtain the published results, essentially capturing explicitly data analysis pipelines, and 2) represent an abstraction of those computations that captures the semantics of the data analysis method in an execution-independent manner. This would make scientific results more reproducible because articles would have not just a textual description of the computational process described in the article but also a workflow that, as a computational artifact, could be analyzed and re-run automatically in other labs that have different software and execution infrastructure.

The initial project objectives are:

Project Publications

This article is the main reference for this work:

A comprehensive description of the workflow publication model with examples:

A shorter, earlier report on workflow publication:

Other material:

Recreating the Drugome Workflow

Our goal is to publish as a reusable computational workflow the method to derive the drug-target network of an organism (i.e., its drugome) published in (Kinnings et al 11) (a preprint is available, see also the project web site). The original work did not use a workflow system, instead the computational steps were run separately and manually.

The Computational Method Described in the Original Article

The article describes a computational pipeline that accesses data from the Protein Data Base (PDB) and carries out a systematic analysis of the proteome of Mycobacterium tuberculosis (TB) against all approved drugs. The process uncovers protein receptors in the organism that could be targeted by drugs currently in use for other purposes. The result is a drug-target network (a “drugome”) that includes all known approved drugs. Although the article focuses on a particular organism (TB), the method itself can be used for other pathogens or pathways and has the potential to be a key resource to develop new more comprehensive treatments for other diseases of interest.

With a workflow, the method could be reproduced as new drugs become available. It could also be reused to create many drugomes for other organisms. In essence, the paper represents a novel method that takes a comprehensive and systematic approach to drug discovery, moving away from current practice which is neither.

With the help of the authors of the article, we created a workflow that reflects the steps that were described in the original article and run it with data used in the original experiments.

We used the “methods” section that describes conceptually what computations were carried out, which is usual in computational biology. However, we needed clarifications from the authors in order to reproduce the computations. Moreover, we found that some of the software originally used in the experiments is no longer available in the lab, so some of the steps already needed to be done differently.

The inputs to the workflow are: 1) a list of binding sites of approved drugs that can be associated with protein crystal structures in PDB, 2) a list of proteins of the TB proteome that have solved structures in PDB, and 3) homology models of annotated comparative protein structure models for TB. First, both the binding sites of protein structures and the homology models are compared against the drug binding sites. Next, the overall similarity of the global protein structures is compared, and only significant pairs are retained. A graph of the resulting interaction network is generated, which can be visualized in tools such as Cytoscape. Finally, molecular docking is performed to predict the affinity of drug molecules with the proteins.

Sketch of the overall workflow

An initial product of the work is a high-level sketch of the overall workflow in the TB drugome paper. This kind of overall view is very useful to someone wanting to reproduce the results of the work, and would be useful to include it as supplemental material of a publication.

A

We started working on five core steps of the workflow:

Other steps not included in our initial work:

Schematic of the computational workflow

This sketch gives an overview of all the computational steps implemented in the final workflow. Five core steps are included in the workflow, each corresponds to a subsection of the methods section in the article. Some intermediate components were added that were not needed in the original work (URL checker, Docking checker, in purple). Grey components were left out of the scope of the initial work.

A

Implementing the Drugome Workflow in WINGS

We use the Wings workflow system to represent both abstract and executable workflows.

Workflow Components

We created workflow components using existing open source software packages:

Workflow Implementations

Initial Implementation of Workflows

We started by creating Wings workflows that expose the codes and data for every step of the method. The overall initial executable workflow is shown here:

A

Initial Abstract Reusable Workflows

Based on the initial implementation of the workflows, we created Abstract workflows. The abstract steps of these workflows make them independent of the code implementations, making them more reusable by groups that use different implementations of the steps, and also more resilient to code changes over time. The system automatically specializes the abstract steps in the workflow into executable codes. The overall abstract workflow is shown here:

A

Final workflow

After refining the initial version of the workflow, a final version was released:

A
A

Mapping Workflows to the Open Provenance Model: The OPMW Profile

The abstract workflow and the executed workflow are both mapped to the Open Provenance Model (OPM), maintaining links between them.

We created OPMW, a profile that extends OPM and PROV to accommodate the publication of abstract workflows and the provenance of their executions.

We mapped terms in Wings to OPMW. The design decisions and some examples are discussed in the Wings to OPM and PROV mapping rationale.

Publishing Workflows as Linked Data

We publish workflows as Linked Data. A version of the workflow as Linked Data can be accessed here. We also developed a simple application to browse the published workflows, which allows you to navigate through the templates and explore their metadata.

Publishing Input Data and Workflow Results

We also published the input data and output data (workflow results) with permanent URLs using Figshare. See the pointers to the individual datasets above where the input data and output data are described.

Processing Data more Efficiently through Parallel Computations

We created a version of the workflow that has parallel execution so that it can run faster.

Augmenting the original article

As a result of this work, the method of the original article can be fully documented and reproduced by augmenting it with explicit and citable information about the data, workflow, software, and figures.

Input data

The original paper included supplementary data, but that is not the raw data in the format used by the codes. The authors provided the following datasets that were used in the original work:

A bundle with ALL the input datasets is available at the following permanent URL: http://dx.doi.org/10.6084/m9.figshare.776910. If you reuse this dataset, please use the doi identifier for attribution.

Output data

A bundle with ALL the output datasets resulting from the workflow is available at the following permanent URL: http://dx.doi.org/10.6084/m9.figshare.776891. If you reuse this dataset, please use the doi identifier for attribution.

Executed Workflow

A complete execution of the workflow and its related materials can be accessed here (link to the permanent URL).

The workflows are available to run through the Wings Drugome workflow portal (password required, please contact us).

Figures

Different visualizations of the results obtained can be accessed here. This is an example:

Viz1.jpg

Workflow

A diagram of the workflow is:

A

A run of the workflow can be browsed here.

A diagram of a general version of the workflow is:

A

Getting started with the TB-Drugome

The TB-Drugome Workflow is complicated to use without the proper knowledge in bioniformatics and computer science. This section aims to provide the initial pointers and references to look at when trying to reuse the TB-Drugome workflow.

Initial pointers

Detailed Timeline of Reproducibility Work

We documented the effort to reproduce the workflow. See a Detailed Timeline of our progress.

Summary and Future Work

Summary of work to date

Future Work

Interesting areas for future work include:

Project Members

Contributors

Sponsors

This project is sponsored by Elsevier Labs, the National Science Foundation with award number IIS-0948429, the Air Force Office of Scientific Resarch with award number FA9550-11-1-0104, and by internal funds from the University of Southern California's Information Sciences Institute and from the University of California, San Diego.

Personal tools
Namespaces
Variants
Actions
Navigation
Toolbox