In cooperation with:

16th Workshop on Workflows in Support of Large-Scale Science

November 15, 2021

Held in conjunction with SC21: The International Conference for High Performance Computing, Networking, Storage and Analysis

Co-chaired by:
Rafael Ferreira da Silva , Oak Ridge National Laboratory, USA
Rosa Filgueira , Heriot-Watt University, Edinburgh, UK

Please, submit any inquiries to chairs@works-workshop.org

Scientific workflows have been almost universally used across scientific domains and have underpinned some of the most significant discoveries of the past several decades. Workflow management systems (WMSs) provide abstraction and automation which enable a broad range of researchers to easily define sophisticated computational processes and to then execute them efficiently on parallel and distributed computing systems. As workflows have been adopted by a number of scientific communities, they are becoming more complex and require more sophisticated workflow management capabilities. A workflow now can analyze terabyte-scale data sets, be composed of one million individual tasks, require coordination between heterogeneous tasks, manage tasks that execute for milliseconds to hours, and can process data streams, files, and data placed in object stores. The computations can be single core workloads, loosely coupled computations, or tightly all within a single workflow, and can run in dispersed computing platforms.

This workshop focuses on the many facets of scientific workflow management systems, ranging from actual execution to service management and the coordination and optimization of data, service, and job dependencies. The workshop covers a broad range of issues in the scientific workflow lifecycle that include: scientific workflows representation and enactment; workflow scheduling techniques to optimize the execution of the workflow on heterogeneous infrastructures; workflow enactment engines that need to deal with failures in the application and execution environment; and a number of computer science problems related to scientific workflows such as semantic technologies, compiler methods, scheduling and fault detection and tolerance.

Keynote

FAIR Computational Workflows

Carole Goble, Dept of Computer Science, The University of Manchester, UK / ELIXIR-UK

The FAIR principles (Findable, Accessible, Interoperable, Reusable) [1] have laid a foundation for sharing and publishing digital assets, starting with data and now extending to all digital objects including software [2]. The use of computational workflows has accelerated in the past few years driven by the need for repetitive and scalable data processing, access to and exchange of processing know-how, and the desire for more reproducible (or at least transparent) and quality assured processing methods [3]. COVID-19 pandemic has highlighted the value of workflows [4]. Over 290 workflow systems are currently available, although a much smaller number are widely adopted [5]. As first class, publishable research objects, it seems natural to apply FAIR principles to workflows [6]. The FAIR data principles themselves originate from a desire to support automated data processing, by emphasizing machine accessibility of data and metadata. As workflows have a dual role as software and explicit method description, their FAIR properties draw from both data and software principles for descriptive metadata, software metrics, and versioning. However, workflows create unique challenges such as representing a complex lifecycle from specification to execution via a workflow system, through to the data created at the completion of the workflow. As workflows are chiefly concerned with the processing and creation of data they have an important role to play in ensuring and supporting data FAIRification.

The work on defining and improving the FAIRness of workflows has already started. A whole ecosystem of tools, guidelines and best practices are under development to reduce the time needed to adapt, reuse and extend existing scientific workflows. For example, a fundamental tenet of FAIR is the universal availability of machine processable metadata. The European EOSC-Life Cluster has developed a metadata framework for FAIR workflows based on schema.org, RO-Crate [7] and Common Workflow Language (CWL) [8], and uses the GA4GH TRS API for a standardised communication protocol to support Accessibility. It has developed and runs the WorkflowHub registry which uses both the framework and the protocol to support workflow Findability. EOSC-Life have made great efforts to on-board community workflow platforms such as Galaxy, snakemake, nextflow and CWL to carry and use FAIR metadata for discovery and reuse. As FAIR software needs to be usable and not just reusable, EOSC-Life has also developed services for, e.g. workflow testing (LifeMonitor), execution and benchmarking.

The Interoperability principle is the hardest to unpack for both data and software. For workflows, interoperability follows two threads: (i) supporting workflow system interoperability through workflow descriptions independent of the underlying system (e.g. CWL and WDL) and (ii) workflow component composability. Workflows are ideally composed of modular building blocks and these and the workflows themselves are expected to be reused, refactored, recycled and remixed. Thus, FAIR applies "all the way down": at the specification and execution level, and for the whole workflow and each of its components. Composability also relates to reuse – that is, adapting [2], a workflow or its component “can be understood, modified, built upon or incorporated into other workflow”. Reuse challenges also include being able to capture and then move workflow components, dependencies, and application environments in such a way as not to affect the resulting execution of the workflow. Interoperability and Reusability present important obligations on software developers to ensure that tools and datasets are workflow ready data with clean I/O programmatic interfaces, no usage restrictions, use of community data standards, and that they are simple to install and designed for portability. Workflow developers can be both data-FAIR, by using and making identifiers, licensing data outputs, tracking data provenance and so on, and workflow-FAIR by managing versions, providing test data, and sharing libraries of composable and reusable workflow “blocks” [9]. Communities are working on reviewing, validating and certifying canonical workflows.

While there are emerging tools for addressing different aspects of FAIR workflows, many challenges remain for describing, annotating, and exposing scientific workflows so that they can be found, understood and reused by other scientists. Further work is required to understand use cases for reuse and enable reuse in the same or different environments. The FAIR principles for workflows need to be community-agreed before metrics can be considered to determine whether a workflow is FAIR, whether a workflow repository or registry is FAIR, and whether it is possible to automatically review whether a workflow’s dataflow is FAIR. Community activism, perhaps led by the platforms and registries coming together in a community group like WorkflowsRI, is needed to define principles, policies and best practices for FAIR workflows and to standardize metadata representation and collection processes. In this talk I will present current work on FAIR principles, practices and services for computational workflows, using developments in the European EOSC-Life Workflow Collaboratory and the Bioexcel Centre of Excellence.

Bio. Carole Goble CBE FREng FBCS is a Professor of Computer Science at the University of Manchester, UK. She leads a team of Researchers, Research Software Engineers and Data Stewards. She has spent 25 years working in e-Science on computational workflows, reproducible science, open sharing, and knowledge and metadata management in a range of disciplines. She has led numerous e-Infrastructure projects and is currently the Head of Node of ELIXIR-UK, the national node of ELIXIR, the European Research Infrastructure for Life Sciences, as well as directing the digital infrastructure for IBISBA, the European Research Infrastructure for Industrial Biotechnology. Both these emphasise the use of computational workflows. Carole led the development of Taverna, one of the first open source computational workflow management systems and myExperiment.org, the first system agnostic web-based sharing platform for workflows and their related data. She was the scientific lead of the EU WF4Ever project which pioneered the notion of workflows as preservable and reproducible Research Objects. She currently co-leads developments in EOSC-Life Cluster Workflow Collaboratory (13 European Research Infrastructures in Biomedical Science lead by ELIXIR) including: the WorkflowHub.eu registry for workflows, the RO-Crate community initiative for packaging, exchanging and publishing workflows as Research Objects and the use of schema.org to mark up workflows. The tools of the Collaboratory are used by other projects from natural history collection digitisation to climate change modelling, and are part of the EU COVID data portal. Carole serves on the Advisory Board of the Common Workflow Language community and is a member of the WorkflowsRI community. In EOSC-Life she is leading developments on FAIR principles and practice applied to Workflows. Carole is also a co-founder of the UK’s Software Sustainability Institute and cares about quality research software and reproducibility. She is an author of the Nature paper proposing the seminal FAIR Principles for Scientific Data, contributes to the RDA FAIR4Research Software initiative, and actively nudges policy (OECD, G7, EU) to recognise software as a first class product of research.

[1] M.D. Wilkinson, M Dumontier et al, The FAIR Guiding Principles for scientific data management and stewardship, Scientific Data 3, (2016), DOI: 10.1038/sdata.2016.18
[2] D.S. Katz, M Gruenpeter, T Honeyman Taking a fresh look at FAIR for research software PATTERNS 2(2) (2021), DOI: 10.1016/j.patter.2021.100222
[3] T Reiter, P.T Brooks, L Irber, S.E.K Joslin, C.M Reid, C Scott, C.T Brown, N.T Pierce-Ward, Streamlining data-intensive biology with workflow systems, GigaScience, 10(1) 2021, giaa140, DOI: 10.1093/gigascience/giaa140
[4] W Maier, S Bray et al Freely accessible ready to use global infrastructure for SARS-CoV-2 monitoring bioRxiv (2021) doi: 10.1101/2021.03.25.437046
[5] L. Wratten, A. Wilm, J Göke Reproducible, scalable, and shareable analysis pipelines with bioinformatics workflow managers. Nat Methods (2021). DOI: 10.1038/s41592-021-01254-9
[6] C Goble, S Cohen-Boulakia, S Soiland-Reyes, D Garijo, Y Gil, M.R. Crusoe, K Peters, D Schober FAIR Computational Workflows Data Intelligence 2020 2:1-2, 108-121, DOI: 10.1162/dint_a_00033.
[7] S Soiland-Reyes, P Sefton, et al Packaging research artefacts with RO-Crate, arXiv:2108.06503v1
[8] M Crusoe et al Methods Included: Standardizing Computational Reuse and Portability with the Common Workflow Language, CACM (2021), DOI: 10.1145/3486897
[9] P Andrio, A Hospital, J Conejero et al. BioExcel Building Blocks, a software library for interoperable biomolecular simulation workflows. Sci Data 6, 169 (2019). DOI: 10.1038/s41597-019-0177-4

Workshop Program

Time	Event
9:00-9:10 AM CST 7:00-7:10 AM PST 15:00-15:10 GMT	Welcome Rosa Filgueira, Rafael Ferreira da Silva
9:10-10:00 AM CST 7:10-8:00 AM PST 15:10-16:00 GMT	Keynote: FAIR Computational Workflows Prof. Carole Goble
10:00-10:30 AM CST 8:00-8:30 AM PST 16:00-16:30 GMT	Break
10:30-10:55 AM CST 8:30-8:55 AM PST 16:30-16:55 GMT	A Recommender System for Scientific Datasets and Analysis Pipelines Mandana Mazaheri, Gregory Kiar, Tristan Glatard
10:55-11:20 AM CST 8:55-9:20 AM PST 16:55-17:20 GMT	Intelligent Resource Provisioning for Scientific Workflows and HPC Benjamin T. Shealy, F. Alex Feltus, Melissa C. Smith
11:20-11:45 AM CST 9:20-9:45 AM PST 17:20-17:45 GMT	Not All Tasks Are Created Equal: Adaptive Resource Allocation for Heterogeneous Tasks in Dynamic Workflows Thanh Son Phung, Douglas Thain, Kyle Chard, Logan Ward
11:45-11:55 AM CST 9:45-9:55 AM PST 17:45-17:55 GMT	Q&A Session 1
11:55-12:05 PM CST 9:55-10:05 AM PST 17:55-18:05 PM GMT	Learning Fundamental Workflow Concepts with EduWRENCH Henri Casanova, Ryan Tanaka, William Koch, Rafael Ferreira da Silva
12:05-12:15 PM CST 10:05-10:15 AM PST 18:05-18:15 GMT	VisDict: Enhancing the Communication between Workflow Providers and User Communities via a Visual Dictionary Sandra Gesing, Rafael Ferreira da Silva, Ewa Deelman, Michael Hildreth, Mary Ann McDowell, Natalie Meyers, Ian Taylor, Douglas Thain
12:15-12:25 PM CST 10:15-10:25 AM PST 18:15-18:25 GMT	Coordinating Dynamic Ensemble Calculations with libEnsemble Stephen Hudson, John-Luke Navarro, Jeffrey Larson, Stefan Wild
12:25-12:30 PM CST 10:25-10:30 AM PST 18:25-18:30 GMT	Q&A Session 2
12:30-2:00 PM CST 10:30-noon PST 18:30-20:00 GMT	Break
2:00-2:25 PM CST noon-12:25 PM PST 20:00-20:25 GMT	Dynamic Heterogeneous Task Specification and Execution for In Situ Workflows Orcun Yildiz, Dmitriy Morozov, Bogdan Nicolae, Tom Peterka
2:25-2:50 PM CST 12:25-12:50 PM PST 20:25-20:50 GMT	An Adaptive Elasticity Policy For Staging Based In-Situ Processing Zhe Wang, Matthieu Dorier, Pradeep Subedi, Philip E. Davis, Manish Parashar
2:50-3:00 PM CST 12:50-1:00 PM PST 20:50-21:00 GMT	Q&A Session 3
3:00-3:30 PM CST 1:00-1:30 PM PST 21:00-21:30 GMT	Break
3:30-3:55 PM CST 1:30-1:55 PM PST 21:30-21:55 GMT	The benefits of prefetching for large-scale cloud-based neuroimaging analysis workflows Valerie Hayot-Sasson, Tristan Glatard, Ariel Rokem
3:55-4:20 PM CST 1:55-2:20 PM PST 21:55-22:20 GMT	ExaWorks: Workflows for Exascale Aymen Al-Saadi, Dong Ahn, Yadu Babuji, Kyle Chard, James Corbett, Mihael Hategan, Stephen Herbein, Shantenu Jha, Daniel Laney, Andre Merzky, Todd Munson, Michael Salim, Mikhail Titov, Matteo Turilli, Thomas Uram, Justin Wozniak
4:20-4:25 PM CST 2:20-2:25 PM PST 22:20-22:25 GMT	Q&A Session 4
4:25-4:35 PM CST 2:25-2:35 PM PST 22:25-22:35 GMT	A Lightweight GPU Monitoring Extension for Pegasus Kickstart Georgios Papadimitriou, Ewa Deelman
4:35-5:00 PM CST 2:35-3:00 PM PST 22:35-23:00 GMT	A Performance Characterization of Scientific Machine Learning Workflows Patrycja Krawczuk, George Papadimitriou, Ryan Tanaka, Tu Mai Anh Do, Srujana Subramanya, Shubham Nagarkar, Aditi Jain, Kelsie Lam, Anirban Mandal, Loïc Pottier, Ewa Deelman
5:00-5:25 PM CST 3:00-3:25 PM PST 23:00-23:25 GMT	Science Capsule: Towards Sharing and Reproducibility of Scientific Workflows Devarshi Ghoshal, Ludovico Bianchi, Abdelilah Essiari, Drew Paine, Sarah S. Poon, Michael Beach, Alpha T. N'Diaye, Patrick Huck, Lavanya Ramakrishnan
5:25-5:30 PM CST 3:25-3:30 PM PST 23:25-23:30 GMT	Q&A Session 5

Important Dates

Full paper and abstract deadline
~~August 15, 2021~~ August 27, 2021 (final extension)
Paper acceptance notification
September 24, 2021
E-copyright registration completed by authors
October 24, 2021
Camera-ready deadline
October 4, 2021
Consent and release form
October 24, 2021
Video presentations
October 24, 2021
Workshop
November 15, 2021
All deadlines are Anywhere on Earth (AoE)

Paper Submission

WORKS21 welcomes original submissions in a range of areas, including but not limited to:

Big Data analytics workflows, AI workflows
Data-driven workflow processing, stream-based workflows
Workflow composition, tools, orchestrators, and languages
Workflow execution in distributed environments (including HPC, clouds, and grids)
FAIR computational workflows
Dynamic data dependent workflow systems solutions
Exascale computing with workflows
In Situ Data Analytics Workflows
Human-in-the-loop workflows
Workflow fault-tolerance and recovery techniques
Workflow user environments, including portals
Workflow applications and their requirements
Adaptive workflows
Workflow optimizations (including scheduling and energy efficiency)
Performance analysis of workflows
Workflow provenance

Papers should present original research and should provide sufficient background material to make them accessible to the broader community.

There will be two forms of presentations:

Talks - Full papers (up to 8 pages)
Describing a research contribution in the topics listed above.
Lightning Talks - Abstracts (up to 1 page)
Describing a novel tool and or scientific workflow.

Submission of a full paper may result in a talk, submission of an abstract may result in a lightning talk. Presenters of full papers will be given a 20-minute time slot (plus 5 minutes for questions) to provide a summary and update to their research work. Presenters of abstracts will be given a 10-minute time slot (including 3 minutes for questions) to present a novel tool or scientific workflow.

Submission Guidelines:

Full papers
Submissions are limited to 8 pages. The 8-page limit includes figures, tables, appendices, and references.
Abstracts
Submissions are limited to 1 page (excluding references). The 1-page limit includes the description of a novel tool/science workflow, and a link of a repository in which the novel source-code of the tool is stored. This repository will need to specify all the instructions necessary to execute the tool, so reviewers can test it. Abstracts will be compiled into a single paper and published as part of the workshop proceedings.

All submitted papers (full papers and abstracts) will undergo a rigorous review process and each will have at least three reviews by members of the program committee. Papers will be accepted based on their technical contributions.

Papers and abstracts should be submitted in the IEEE format (see https://www.ieee.org/conferences/publishing/templates.html).
WORKS papers this year will be published in cooperation with TCHPC and they will be available from IEEE digital repository.

Organization

Program Committee Chairs

Rafael Ferreira da Silva

Oak Ridge National Laboratory, USA

Rosa Filgueira

Heriot-Watt University, Edinburgh, UK

General Chair

Ian Taylor

Cardiff University, UK
University of Notre Dame, USA

Publicity Chair

Tainã Coleman

University of Southern California, USA

Steering Committee

David Abramson

University of Queensland, Australia

Malcolm Atkinson

University of Edinburgh, UK

Ewa Deelman

University of Southern California, USA

Michela Taufer

University of Tennessee, USA

Program Committee

King's College London

Barcelona Supercomputing Center

Universit. Paris-Dauphine

University of Tennessee

University of Hawaii at Manoa

University of Chicago

University of Southern California

Common Workflow Language

Seqera Labs

University of Innsbruck

University of Chicago

Concordia University

Rutgers University

University of Illinois at Urbana-Champaign

AGH UST

RENCI

UFRJ

University of Naples Parthenope

Lawrence Livermore National Laboratory

University of Southern California

University of Klagenfurt

Lawrence Berkeley National Laboratory

Southern Illinois University

University of Manchester

IBM Research

CNRS, INRIA

University of Calabria

University of Notre Dame

Oak Ridge National Laboratory

Argonne National Laboratory

New Jersey Institute of Technology