This pages summarizes the mapping rationale followed to convert the WINGS template and result files to OPM and PROV.
WINGS produces the template separated from the results. However, to understand the execution results it is mandatory to have the template beside you, because the results have the bindings of the variables within the template.
The mapping to OPM and PROV aims to:
- Separate the abstract workflow or normal workflow template(elaborated wings template) from the execution results.
- Produce "cool uris" (link) for the instances of the template and the results. This allows:
- To have derreferenceable URIs in the system.
- Be able to publish and access them as Linked Data
- Share Wings components, abstract workflows, execution results, etc.
- Link to the original untransformed WINGS template file.
- OPM is close to PROV standard, and its interoperability is increasing through systems.
- We will be using the OPMO-OPMV mapped ontology by Luc Moreau, Simon Miles, Paolo Missier, Paul Groth, Joe Futrelle, Li Ding, Daniel Garijo, Jeff Pan, Jun Zhao, Mike Jewell: .
OPM proposes to model workflows as a DAG (Directed Acyclic Graph), where the main three nodes are Artifacts (immutable pieces of state), processes (actions or series of actions that occur to artifacts and that end up in the creation of new artifacts) and Agents (catalyzers of the processes, entities that control them).
This three nodes relate to each other through 5 different relationships: used (a process used an artifact), wasControlledBy (a process was controlled by an agent), wasGeneratedBy (an artifact was generated by a process), wasTriggeredBy (a process is triggered by another process) and wasDerivedFrom (an artifact is derived from another artifact). Since the last 2 relationships don't appear in WINGS, and wasDerivedFrom can't be inferred according to the specification, we will not use them for our modelling.
- We will be extending a lightweight notion of the core of the ontology, mainly OPMV + OPMGraphs and Accounts. Roles will not be used (there is not relevant information in the domain).
- Why OPM?
- It is a standard provenance model developed for use in scientific domains.
- It is the closest model to PROV (W3C provenance standard).
- It has been used in many other applications and domains.
- Experience handling and extending the core.
OPMW specification is published here: http://www.opmw.org/model/OPMW/
Update 27-Aug-2013: The PROV standard has been released in 2013. Therefore we have updated OPMW with a mapping to PROV. All the resources being published according to OPMW are also published according to PROV concepts.
WINGS workflow templates hold the main structure of the workflow. They have descriptions of the inputs and outputs of each node and they provide a unique identifier for each component in each template. They also link to the components and their classes. Although the templates have different node names for each template, different templates may have the same nodeName. When publishing the information as linked data, each node should be unique per template.
- Design decissions
- Add the template name to the nodes. This way we will have a unique node per template, but we will still be able to say that the nodes from different templates are using the same component.
- Since the template is not controlled by anyone, we don't add such information. The template was created by an user, but that user is asserted as "contributor" to the template.
- Both WorkflowTemplateArtifacts and WorkflowTemplateProcesses do not extend OPM. The template itself can be seen as a plan for the execution, but is not considered provenance itself. Therefore separated classes are created.
See the full OPMW specification for further details.
- Design decissions:
- Each execution results file gives a unique ID to the artifacts used within the execution file. Therefore, to identify artifacts as unique instances giving them the same URI if they are used in different executions, we have decided to build their URI encoding in MD5 the artifact location. (The local name of the artifact may not be enough, if we are using different data catalog within the same machine. The location guarantees that the resource is unique). Collissions in the encoding are possible, but very unlikely giving the number of resources managed.
- Parameters don't have a location URI, so the solution is to generate a unique ID with their value. Example: pvalue with value 0.00001 would generate the URI: <http://www.opmw.org/export/resource/WorkflowExecutionArtifact/PVALUE0.00001>
- Extending OPM classes: WorkflowExecutionArtifacts are a subtype of opm:Artifacts. The nodes (WorkflowExecutionProcesses) extend opm:Processes, which have a specific component from the data catalog (the one used for the execution) and a link to the process template (so we will know which an instantiation of the abstract it is). WorkflowExecutionArtifacts are created from the used and generated artifact by the nodes, and they are linked to their location, WorkflowTemplateArtifact and concrete type of the data catalog. All agents, artifacts and processes of the same execution file are linked to the Account of that execution, using the OPMO relationship "account". The account is bound to the template using the relationship "correspondsToTemplate".
- Extending OPM Properties: just two extensions:
- correspondsToTemplateArtifact, correspondsToTemplateProcess and correspondsToTemplate bind an WorkflowExecutionArtifacts , WorkflowExecutionProcesses and WorkflowExecutionAccounts to their correspondant WorkflowTemplateArtifact, WorkflowTemplateProcess and WorkflowTemplate.
- hasExecutableComponent binds each WorkflowExecutionProcesses to their specific components, instances of the component catalog.
- Additional metadata:
- The controller agent is the user who run the workflow. Since it is not a resource in Wings (but a literal), we have assigned it a URI with its name as identifier. As an example: "Daniel"^^xsd:String would become <http://www.opmw.org/export/resource/Agent/DANIEL>.
- In the execution result file we can find the starting time and the ending time of the workflow execution. In OPM there are no mechanisms to group both times to an account. Instead, you can add the starting and ending time to each process execution. An idea would be to link the starting time of the first process to the starting time of the workflow execution, and the ending time of the last process to the ending time of the workflow execution. However, there is no way to know which process has been the first one or the last one. This 2 images show counter examples:
StartTime: If we query the processes that use artifacts that haven't been generated by any other process, then it is not enough. In the figure, we see that according to this assertion P1 and P2 have the same starting time, which would be wrong.
EndTime: If we query the nodes that generate artifacts that are not used anymore, then it is not enough. In the figure, we see that according to this assertion P1 and P2 have the same ending time, which would be wrong.
Therefore, the design decission taken is to create 2 new properties (overallStartTime and overallEndTime) to link an account to a dateTime literal. If execution times of each process is included in the future, we could add the relationship to each node too.
OPMW describes the traces of the execution of a workflow along with the abstract workflow (template) used for its design. The trace is described by extending opmv:Artifact with opmw:WorkflowExecutionArtifact; opmv:Process with WorkflowExecutionProcess; and reusing OPM relationships to link them(opmv:used, opmv:wasControlledBy and opmv:wasGeneratedBy). All the assertions from an execution are grouped in a opmw:WorkflowExecutionAccount, a subclass of opmo:Accoount that represents the view of the system on the execution.
Templates are defined with new terms in OPMW, in a similar way to the traces. In this case, the reuse of OPM is not appropriate since we are describing the plan of the workflow(which may be executed in the future or not), not the execution. Templates have opmw:WorkflowTemplateArtifacts (which can be either opmw:DataVariables or opmw:ParameterVariables) and opmw:WorkflowTemplateProcesses, which represent an abstraction of the method that is being executed.
The opmw:WorkflowTemplateArtifacts are connected to opmw:WorkflowTemplateProcesses by opmw:uses and opmw:isGeneratedBy properties. These properties define which type of opmw:WorkflowTemplateArtifact is used by each opmw:WorkflowTemplateProcess and the type of the expected result. The next figure shows a brief example.
The figure shows a process view high level diagram of the OPM and OPMW representation of an abstract workflow on the left and a workflow execution on the right. The example workflow shown here has one step (executionNode1), which runs the workflow component (specComp1) that has one input (execInput1) and one output (executionOutput1).
URI naming convention
URL base for the ontology repository: http://www.opmw.org/ontology/ URI of the OPM profile: http://www.opmw.org/ontology/ URI followed by the resources: http://www.opmw.org/export/resource/<Type>/<ResourceName>
- <Type> refers to the type class of the resource.
- <ResourceName> refers to the resource identifier.
- /export/ is the name of the dataset I have chosen to publish the results of the workflows. Therefore, if we want to export other datasets we will be able to do so.
Installation where we have the current workflow repository.
- Instance set up in: http://wind.isi.edu:8890/sparql
- Example queries are available here: http://opmw.org/node/6
- An application consuming the Linked Data exposed is available here: http://mayor2.dia.fi.upm.es/oeg-upm/files/dgarijo/DemoWfLinkedData/
Allegro endpoint information (deprecreated)
Allegro was used as an early installation for tests of the repository. It is not further mantained.
- Instance set up in: http://ec2-184-72-160-64.compute-1.amazonaws.com:10035
- Catalog ID: java-catalog
- Repository ID: WINGSTemplatesAndResults
- User: Available upon request.
- Password: Available upon request.