16th Workshop on Workflows in Support of Large-Scale Science
November 15, 2021
Held in conjunction with SC21: The International Conference for High Performance Computing, Networking, Storage and AnalysisRafael Ferreira da Silva , Oak Ridge National Laboratory, USA
Rosa Filgueira , Heriot-Watt University, Edinburgh, UK
Please, submit any inquiries to chairs@works-workshop.org
Scientific workflows have been almost universally used across scientific domains and have underpinned some of the
most significant discoveries of the past several decades. Workflow management systems (WMSs) provide abstraction and
automation which enable a broad range of researchers to easily define sophisticated computational processes and to
then execute them efficiently on parallel and distributed computing systems. As workflows have been adopted by a
number of scientific communities, they are becoming more complex and require more sophisticated workflow management
capabilities. A workflow now can analyze terabyte-scale data sets, be composed of one million individual tasks,
require coordination between heterogeneous tasks, manage tasks that execute for milliseconds to hours, and can
process data streams, files, and data placed in object stores. The computations can be single core workloads,
loosely coupled computations, or tightly all within a single workflow, and can run in dispersed computing platforms.
This workshop focuses on the many facets of scientific workflow management systems, ranging from actual execution to
service management and the coordination and optimization of data, service, and job dependencies. The workshop covers
a broad range of issues in the scientific workflow lifecycle that include: scientific workflows representation and
enactment; workflow scheduling techniques to optimize the execution of the workflow on heterogeneous
infrastructures; workflow enactment engines that need to deal with failures in the application and execution
environment; and a number of computer science problems related to scientific workflows such as semantic
technologies, compiler methods, scheduling and fault detection and tolerance.
Keynote
FAIR Computational Workflows
Carole Goble, Dept of Computer Science, The University of Manchester, UK / ELIXIR-UK
The FAIR principles (Findable, Accessible, Interoperable, Reusable) [1] have laid a foundation for sharing and publishing digital assets, starting with data and now extending to all digital objects including software [2]. The use of computational workflows has accelerated in the past few years driven by the need for repetitive and scalable data processing, access to and exchange of processing know-how, and the desire for more reproducible (or at least transparent) and quality assured processing methods [3]. COVID-19 pandemic has highlighted the value of workflows [4]. Over 290 workflow systems are currently available, although a much smaller number are widely adopted [5]. As first class, publishable research objects, it seems natural to apply FAIR principles to workflows [6]. The FAIR data principles themselves originate from a desire to support automated data processing, by emphasizing machine accessibility of data and metadata. As workflows have a dual role as software and explicit method description, their FAIR properties draw from both data and software principles for descriptive metadata, software metrics, and versioning. However, workflows create unique challenges such as representing a complex lifecycle from specification to execution via a workflow system, through to the data created at the completion of the workflow. As workflows are chiefly concerned with the processing and creation of data they have an important role to play in ensuring and supporting data FAIRification.
The work on defining and improving the FAIRness of workflows has already started. A whole ecosystem of tools, guidelines and best practices are under development to reduce the time needed to adapt, reuse and extend existing scientific workflows. For example, a fundamental tenet of FAIR is the universal availability of machine processable metadata. The European EOSC-Life Cluster has developed a metadata framework for FAIR workflows based on schema.org, RO-Crate [7] and Common Workflow Language (CWL) [8], and uses the GA4GH TRS API for a standardised communication protocol to support Accessibility. It has developed and runs the WorkflowHub registry which uses both the framework and the protocol to support workflow Findability. EOSC-Life have made great efforts to on-board community workflow platforms such as Galaxy, snakemake, nextflow and CWL to carry and use FAIR metadata for discovery and reuse. As FAIR software needs to be usable and not just reusable, EOSC-Life has also developed services for, e.g. workflow testing (LifeMonitor), execution and benchmarking.
The Interoperability principle is the hardest to unpack for both data and software. For workflows, interoperability follows two threads: (i) supporting workflow system interoperability through workflow descriptions independent of the underlying system (e.g. CWL and WDL) and (ii) workflow component composability. Workflows are ideally composed of modular building blocks and these and the workflows themselves are expected to be reused, refactored, recycled and remixed. Thus, FAIR applies "all the way down": at the specification and execution level, and for the whole workflow and each of its components. Composability also relates to reuse – that is, adapting [2], a workflow or its component “can be understood, modified, built upon or incorporated into other workflow”. Reuse challenges also include being able to capture and then move workflow components, dependencies, and application environments in such a way as not to affect the resulting execution of the workflow. Interoperability and Reusability present important obligations on software developers to ensure that tools and datasets are workflow ready data with clean I/O programmatic interfaces, no usage restrictions, use of community data standards, and that they are simple to install and designed for portability. Workflow developers can be both data-FAIR, by using and making identifiers, licensing data outputs, tracking data provenance and so on, and workflow-FAIR by managing versions, providing test data, and sharing libraries of composable and reusable workflow “blocks” [9]. Communities are working on reviewing, validating and certifying canonical workflows.
While there are emerging tools for addressing different aspects of FAIR workflows, many challenges remain for describing, annotating, and exposing scientific workflows so that they can be found, understood and reused by other scientists. Further work is required to understand use cases for reuse and enable reuse in the same or different environments. The FAIR principles for workflows need to be community-agreed before metrics can be considered to determine whether a workflow is FAIR, whether a workflow repository or registry is FAIR, and whether it is possible to automatically review whether a workflow’s dataflow is FAIR. Community activism, perhaps led by the platforms and registries coming together in a community group like WorkflowsRI, is needed to define principles, policies and best practices for FAIR workflows and to standardize metadata representation and collection processes. In this talk I will present current work on FAIR principles, practices and services for computational workflows, using developments in the European EOSC-Life Workflow Collaboratory and the Bioexcel Centre of Excellence.
Bio. Carole Goble CBE FREng FBCS is a Professor of Computer Science at the University of Manchester, UK. She leads a team of Researchers, Research Software Engineers and Data Stewards. She has spent 25 years working in e-Science on computational workflows, reproducible science, open sharing, and knowledge and metadata management in a range of disciplines. She has led numerous e-Infrastructure projects and is currently the Head of Node of ELIXIR-UK, the national node of ELIXIR, the European Research Infrastructure for Life Sciences, as well as directing the digital infrastructure for IBISBA, the European Research Infrastructure for Industrial Biotechnology. Both these emphasise the use of computational workflows. Carole led the development of Taverna, one of the first open source computational workflow management systems and myExperiment.org, the first system agnostic web-based sharing platform for workflows and their related data. She was the scientific lead of the EU WF4Ever project which pioneered the notion of workflows as preservable and reproducible Research Objects. She currently co-leads developments in EOSC-Life Cluster Workflow Collaboratory (13 European Research Infrastructures in Biomedical Science lead by ELIXIR) including: the WorkflowHub.eu registry for workflows, the RO-Crate community initiative for packaging, exchanging and publishing workflows as Research Objects and the use of schema.org to mark up workflows. The tools of the Collaboratory are used by other projects from natural history collection digitisation to climate change modelling, and are part of the EU COVID data portal. Carole serves on the Advisory Board of the Common Workflow Language community and is a member of the WorkflowsRI community. In EOSC-Life she is leading developments on FAIR principles and practice applied to Workflows. Carole is also a co-founder of the UK’s Software Sustainability Institute and cares about quality research software and reproducibility. She is an author of the Nature paper proposing the seminal FAIR Principles for Scientific Data, contributes to the RDA FAIR4Research Software initiative, and actively nudges policy (OECD, G7, EU) to recognise software as a first class product of research.
- [1] M.D. Wilkinson, M Dumontier et al, The FAIR Guiding Principles for scientific data management and stewardship, Scientific Data 3, (2016), DOI: 10.1038/sdata.2016.18
- [2] D.S. Katz, M Gruenpeter, T Honeyman Taking a fresh look at FAIR for research software PATTERNS 2(2) (2021), DOI: 10.1016/j.patter.2021.100222
- [3] T Reiter, P.T Brooks, L Irber, S.E.K Joslin, C.M Reid, C Scott, C.T Brown, N.T Pierce-Ward, Streamlining data-intensive biology with workflow systems, GigaScience, 10(1) 2021, giaa140, DOI: 10.1093/gigascience/giaa140
- [4] W Maier, S Bray et al Freely accessible ready to use global infrastructure for SARS-CoV-2 monitoring bioRxiv (2021) doi: 10.1101/2021.03.25.437046
- [5] L. Wratten, A. Wilm, J Göke Reproducible, scalable, and shareable analysis pipelines with bioinformatics workflow managers. Nat Methods (2021). DOI: 10.1038/s41592-021-01254-9
- [6] C Goble, S Cohen-Boulakia, S Soiland-Reyes, D Garijo, Y Gil, M.R. Crusoe, K Peters, D Schober FAIR Computational Workflows Data Intelligence 2020 2:1-2, 108-121, DOI: 10.1162/dint_a_00033.
- [7] S Soiland-Reyes, P Sefton, et al Packaging research artefacts with RO-Crate, arXiv:2108.06503v1
- [8] M Crusoe et al Methods Included: Standardizing Computational Reuse and Portability with the Common Workflow Language, CACM (2021), DOI: 10.1145/3486897
- [9] P Andrio, A Hospital, J Conejero et al. BioExcel Building Blocks, a software library for interoperable biomolecular simulation workflows. Sci Data 6, 169 (2019). DOI: 10.1038/s41597-019-0177-4
Workshop Program
Time | Event |
---|---|
9:00-9:10 AM CST 7:00-7:10 AM PST 15:00-15:10 GMT |
Welcome Rosa Filgueira, Rafael Ferreira da Silva |
9:10-10:00 AM CST 7:10-8:00 AM PST 15:10-16:00 GMT |
Keynote: FAIR Computational Workflows Prof. Carole Goble |
10:00-10:30 AM CST 8:00-8:30 AM PST 16:00-16:30 GMT |
Break |
10:30-10:55 AM CST 8:30-8:55 AM PST 16:30-16:55 GMT |
A Recommender System for Scientific Datasets and Analysis
Pipelines Mandana Mazaheri, Gregory Kiar, Tristan Glatard |
10:55-11:20 AM CST 8:55-9:20 AM PST 16:55-17:20 GMT |
Intelligent Resource Provisioning for Scientific Workflows and HPC Benjamin T. Shealy, F. Alex Feltus, Melissa C. Smith |
11:20-11:45 AM CST 9:20-9:45 AM PST 17:20-17:45 GMT |
Not All Tasks Are Created Equal: Adaptive Resource Allocation for
Heterogeneous Tasks in Dynamic Workflows Thanh Son Phung, Douglas Thain, Kyle Chard, Logan Ward |
11:45-11:55 AM CST 9:45-9:55 AM PST 17:45-17:55 GMT |
Q&A Session 1 |
11:55-12:05 PM CST 9:55-10:05 AM PST 17:55-18:05 PM GMT |
Learning Fundamental Workflow Concepts with EduWRENCH Henri Casanova, Ryan Tanaka, William Koch, Rafael Ferreira da Silva |
12:05-12:15 PM CST 10:05-10:15 AM PST 18:05-18:15 GMT |
VisDict: Enhancing the Communication between Workflow Providers and User
Communities via a Visual Dictionary Sandra Gesing, Rafael Ferreira da Silva, Ewa Deelman, Michael Hildreth, Mary Ann McDowell, Natalie Meyers, Ian Taylor, Douglas Thain |
12:15-12:25 PM CST 10:15-10:25 AM PST 18:15-18:25 GMT |
Coordinating Dynamic Ensemble Calculations with libEnsemble Stephen Hudson, John-Luke Navarro, Jeffrey Larson, Stefan Wild |
12:25-12:30 PM CST 10:25-10:30 AM PST 18:25-18:30 GMT |
Q&A Session 2 |
12:30-2:00 PM CST 10:30-noon PST 18:30-20:00 GMT |
Break |
2:00-2:25 PM CST noon-12:25 PM PST 20:00-20:25 GMT |
Dynamic Heterogeneous Task Specification and Execution for In Situ
Workflows Orcun Yildiz, Dmitriy Morozov, Bogdan Nicolae, Tom Peterka |
2:25-2:50 PM CST 12:25-12:50 PM PST 20:25-20:50 GMT |
An Adaptive Elasticity Policy For Staging Based In-Situ Processing Zhe Wang, Matthieu Dorier, Pradeep Subedi, Philip E. Davis, Manish Parashar |
2:50-3:00 PM CST 12:50-1:00 PM PST 20:50-21:00 GMT |
Q&A Session 3 |
3:00-3:30 PM CST 1:00-1:30 PM PST 21:00-21:30 GMT |
Break |
3:30-3:55 PM CST 1:30-1:55 PM PST 21:30-21:55 GMT |
The benefits of prefetching for large-scale cloud-based neuroimaging
analysis workflows Valerie Hayot-Sasson, Tristan Glatard, Ariel Rokem |
3:55-4:20 PM CST 1:55-2:20 PM PST 21:55-22:20 GMT |
ExaWorks: Workflows for Exascale Aymen Al-Saadi, Dong Ahn, Yadu Babuji, Kyle Chard, James Corbett, Mihael Hategan, Stephen Herbein, Shantenu Jha, Daniel Laney, Andre Merzky, Todd Munson, Michael Salim, Mikhail Titov, Matteo Turilli, Thomas Uram, Justin Wozniak |
4:20-4:25 PM CST 2:20-2:25 PM PST 22:20-22:25 GMT |
Q&A Session 4 |
4:25-4:35 PM CST 2:25-2:35 PM PST 22:25-22:35 GMT |
A Lightweight GPU Monitoring Extension for Pegasus Kickstart Georgios Papadimitriou, Ewa Deelman |
4:35-5:00 PM CST 2:35-3:00 PM PST 22:35-23:00 GMT |
A Performance Characterization of Scientific Machine Learning
Workflows Patrycja Krawczuk, George Papadimitriou, Ryan Tanaka, Tu Mai Anh Do, Srujana Subramanya, Shubham Nagarkar, Aditi Jain, Kelsie Lam, Anirban Mandal, Loïc Pottier, Ewa Deelman |
5:00-5:25 PM CST 3:00-3:25 PM PST 23:00-23:25 GMT |
Science Capsule: Towards Sharing and Reproducibility of Scientific
Workflows Devarshi Ghoshal, Ludovico Bianchi, Abdelilah Essiari, Drew Paine, Sarah S. Poon, Michael Beach, Alpha T. N'Diaye, Patrick Huck, Lavanya Ramakrishnan |
5:25-5:30 PM CST 3:25-3:30 PM PST 23:25-23:30 GMT |
Q&A Session 5 |
Important Dates
-
Full paper and abstract deadline
August 15, 2021August 27, 2021 (final extension) -
Paper acceptance notification
September 24, 2021 -
E-copyright registration completed by authors
October 24, 2021 -
Camera-ready deadline
October 4, 2021 -
Consent and release form
October 24, 2021 -
Video presentations
October 24, 2021 -
Workshop
November 15, 2021 - All deadlines are Anywhere on Earth (AoE)
Paper Submission
WORKS21 welcomes original submissions in a range of areas, including but not limited to:
- Big Data analytics workflows, AI workflows
- Data-driven workflow processing, stream-based workflows
- Workflow composition, tools, orchestrators, and languages
- Workflow execution in distributed environments (including HPC, clouds, and grids)
- FAIR computational workflows
- Dynamic data dependent workflow systems solutions
- Exascale computing with workflows
- In Situ Data Analytics Workflows
- Human-in-the-loop workflows
- Workflow fault-tolerance and recovery techniques
- Workflow user environments, including portals
- Workflow applications and their requirements
- Adaptive workflows
- Workflow optimizations (including scheduling and energy efficiency)
- Performance analysis of workflows
- Workflow provenance
Papers should present original research and should provide sufficient background material to make them accessible to the broader community.
-
Talks - Full papers (up to 8 pages)
Describing a research contribution in the topics listed above. -
Lightning Talks - Abstracts (up to 1 page)
Describing a novel tool and or scientific workflow.
Submission of a full paper may result in a talk, submission of an abstract may result in a lightning talk. Presenters of full papers will be given a 20-minute time slot (plus 5 minutes for questions) to provide a summary and update to their research work. Presenters of abstracts will be given a 10-minute time slot (including 3 minutes for questions) to present a novel tool or scientific workflow.
Submission Guidelines:-
Full papers
Submissions are limited to 8 pages. The 8-page limit includes figures, tables, appendices, and references. -
Abstracts
Submissions are limited to 1 page (excluding references). The 1-page limit includes the description of a novel tool/science workflow, and a link of a repository in which the novel source-code of the tool is stored. This repository will need to specify all the instructions necessary to execute the tool, so reviewers can test it. Abstracts will be compiled into a single paper and published as part of the workshop proceedings.
Papers and abstracts should be submitted in the IEEE format (see
https://www.ieee.org/conferences/publishing/templates.html).
WORKS papers this year will be published in cooperation with
TCHPC and they will be available from IEEE
digital repository.
Organization
Program Committee Chairs
Publicity Chair
Steering Committee
Program Committee
King's College London
Barcelona Supercomputing Center
Universit. Paris-Dauphine
University of Tennessee
University of Hawaii at Manoa
University of Chicago
University of Southern California
Common Workflow Language
Seqera Labs
University of Innsbruck
University of Chicago
Concordia University
Rutgers University
University of Illinois at Urbana-Champaign
AGH UST
RENCI
UFRJ
University of Naples Parthenope
Lawrence Livermore National Laboratory
University of Southern California
University of Klagenfurt
Lawrence Berkeley National Laboratory
Southern Illinois University
University of Manchester
IBM Research
CNRS, INRIA
University of Calabria
University of Notre Dame
Oak Ridge National Laboratory
Argonne National Laboratory
New Jersey Institute of Technology