WINGS is a semantic workflow system that assists scientists with the design of computational experiments. A unique feature of WINGS is that its workflow representations incorporate semantic constraints about datasets and workflow components, and are used to create and validate workflows and to generate metadata for new data products. WINGS submits workflows to execution frameworks such as Pegasus and OODT to run workflows at large scale in distributed resources. [more]

New Version of Wings Released (4.0 Development Version)

More details can be found at http://www.wings-workflows.org/node/17664

New version of Wings Released (Source Code)

Source code can be found at:
https://github.com/IKCAP/wings

Semantic Workflows for Cancer Clinical Omics

In collaboration with the Knight Cancer Institute at the Oregon Health and Science University, we are using Wings workflows to annotate patient sequence variants obtained through clinical DNA sequencing.

Initial integration of OODT and Wings

We achieved an initial integration of the Wings workflow system with Apache OODT (http://oodt.apache.org), so Wings workflows can be executed in the OODT platform which manages data and computations at extreme scale.

To see details, visit the following page:
https://cwiki.apache.org/confluence/display/OODT/Integrating+OODT+with+W...

Wings source code released on GitHub

We've got all the source code for the Wings system on GitHub now.

You can view the repository at:
https://github.com/IKCAP/wings

Wings Standalone Versions Released

For quick installation, we now also provide a Standalone Wings Bundle with all other software packages that it depends on. You can download the Standalone versions of Wings (with in-built Apache, MySQL, PHP, Tomcat and pre-installed Wings) from here:

Download Wings Standalone Bundle

Exporting Wings provenance using the emerging W3C PROV standard

We are now exporting provenance records for Wings workflows as Linked Open Data using the W3C PROV provenance standard

READ MORE

OPMW (Open Provenance Model for Workflows)

OPMW is an ontology for describing workflows based on the Open Provenance Model. OPMW allows the publication of workflow execution traces as well as the more abstract reusable workflows that were originally used.

Workflows as Linked Data


The goal of this work is to extend articles with scientific workflows to 1) represent computations carried out to obtain the published results, essentially capturing explicitly data analysis pipelines, and 2) represent an abstraction of those computations that captures the semantics of the data analysis method in an execution-independent manner. This would make scientific results more reproducible because articles would have not just a textual description of the computational process described in the article but also a workflow that, as a computational artifact, could be analyzed and re-run automatically.
In recent years, a variety of systems have been developed that export the workflows used to analyze data and make them part of published articles. The workflows that are published in current approaches are dependent on the specific codes used for execution, the specific workflow system used, and the specific workflow catalogs where they are published.


In this work, we take a new approach that addresses these shortcomings and makes workflows more reusable through: 1) the use of abstract workflows to complement executable workflows to make them reusable when the execution environment is different, 2) the publication of both abstract and executable workflows using standards such as the Open Provenance Model that can be imported by other workflow systems, 3) the publication of workflows as Linked Data that results in open web accessible workflow repositories. Our initial focus is a complex workflow that we re-created from an influential drug discovery publication that describes the generation of ‘drugomes’.

The TB Drugome Workflow

Our initial focus is on a reusable computational workflow the method to derive the drug-target network of an organism (i.e., its drugome) published in (Kinnings et al 11) (a preprint is available, see also the project web site ).
The article describes a computational pipeline that accesses data from the Protein Data Base (PDB) and carries out a systematic analysis of the proteome of Mycobacterium tuberculosis (TB) against all approved drugs. The process uncovers protein receptors in the organism that could be targeted by drugs currently in use for other purposes. The result is a drug-target network (a “drugome”) that includes all known approved drugs. Although the article focuses on a particular organism (TB), the method itself can be used for other pathogens or pathways and has the potential to be a key resource to develop new more comprehensive treatments for other diseases of interest.
With the help of the authors of the article, we have created the executable workflow that reflects the steps that were described in the original article and run it with data used in the original experiments.
The final executable workflow can be seen here:
Drugome Executable Workflow

To export the workflows we developed OPMW as an extension of OPM that can represent abstract workflows.

OPM is a widely-used domain-independent provenance model result of the Provenance Challenge Series and years of workflow provenance exchange and standardization in the scientific workflow community.

There are several reasons to use OPM. First, OPM has been already used successfully in many scientific workflow systems, thus making our published workflows more reusable. Another advantage is that the core definitions in OPM are domain independent and extensible to accommodate other purposes, in our case workflow representations. In addition, OPM can be considered the basis of the emerging W3C Provenance Interchange Language (PROV), which is currently being developed by the W3C Provenance Working Group as a standard for representing and publishing provenance on the Web.

OPM offers several core concepts and relationships to represent provenance. OPM models the resources (datasets) as artifacts (immutable pieces of state), processes (action or series of actions performed on artifacts), and agents (controllers of processes). Their relationships are modeled in a provenance graph with five causal edges: used (a process used some artifact), wasControlledBy (an agent controlled some process), wasGeneratedBy (a process generated an artifact), wasDerivedFrom (an artifact was derived from another artifact) and wasTriggeredBy (a process was triggered by another process). It also introduces the concept of roles to assign the type of activity that artifacts, processes or agents played when interacting with each other, and the notion of accounts and provenance graphs to group sets of OPM assertions into different subgraphs. An account represents a particular view on the provenance of an artifact based on what was executed. We mapped Wings ontologies to the OPM core model, extending OPM core concepts and relationships according to our needs in a new profile called OPMW.

We use two OPM ontologies for our mapping. OPMV is a lightweight RDF vocabulary implementation of the OPM model that only has a subset of the concepts in OPM but it facilitates modeling and query formulation. OPMO covers the full functionality of the OPM model, and we use it for mapping to OPM concepts that are not in OPMV, such as Account or OPM Graph.

Figure 1: OPMW extension Figure 1 shows a high level diagram of the mappings to OPM of an abstract workflow on the left and a specific execution on the right. The workflow shown here has one step (executionNode1), which runs the workflow component (specComp1) that has one input (execInput1) and one output (executionOutput1). For some of the concepts there is a straightforward mapping: datasets are a subtype of Artifacts, while workflow steps, also called nodes, map to OPM Processes. Notice that each node has a link to the component that is run in that step, for example the workflow in Figure 1 has two nodes that run the same component SMAPV2. There is no OPM term that can be mapped to components, so we used our own terms (represented with the ac prefix in the Figure 1).

In the figure, the terms taken from OPMO and OPMV are indicated using their namespaces. The new terms that we defined in our extension profile use the OPMW prefix. The ontology can be browsed here

WINGS workflow system

WINGS is a workflow system that assists scientists with the design of computational experiments. A computational experiment specifies how selected datasets are to be processed by a series of software components in a particular configuration. Earth scientists use computational experiments to estimate seismic hazard through simulations of earthquake forecasts. Biologists use computational experiments for analysis of gene expression microarray data or molecular interaction networks and pathways. Social scientists analyze large social networks to discover structural regularities based on mining relations among individuals.

We use workflows to represent computational experiments. Workflows represent application components and their dependencies in terms of dataflow among them. Workflow systems have been developed to assist users with some aspect of the process, for example to assemble workflows out of large component libraries, to optimize execution performance, and for workflow sharing. None of these systems provides comprehensive support for workflow design and exploration. To learn more about the state of the art in workflow systems, please visit http://www.isi.edu/nsf-workflows06.

Jena semantic framework

Jena is a Java framework for building Semantic Web applications. It provides a programmatic environment for RDF, RDFS and OWL, SPARQL and includes a rule-based inference engine.
Jena is open source and grown out of work with the HP Labs Semantic Web Program.

Allegro Graph database

AllegroGraph is a modern, high-performance, persistent graph database. AllegroGraph uses efficient memory utilization in combination with disk-based storage, enabling it to scale to billions of quads while maintaining superior performance. AllegroGraph supports SPARQL, RDFS++, and Prolog reasoning from numerous client applications.

Pubby

Pubby can be used to add Linked Data interfaces to SPARQL endpoints. Much Semantic Web data lives inside triple stores and can be accessed only by sending SPARQL queries to a SPARQL endpoint. It is hard to connect information in these stores with other external data sources.
Pubby makes it easy to turn a SPARQL endpoint into a Linked Data server. It is implemented as a Java web application.

Project Members

Contributors

Sponsors

This project is sponsored by Elsevier Labs, the National Science Foundation with award number CCF-0725332, the Air Force Office of Scientific Resarch with award number FA9550-11-1-0104, and by internal funds from the University of Southern California's Information Sciences Institute and from the University of California, San Diego.
Syndicate content