In cooperation with:

16th Workshop on Workflows in Support of Large-Scale Science

November 15, 2021

Held in conjunction with SC21: The International Conference for High Performance Computing, Networking, Storage and Analysis
Co-chaired by:
Rafael Ferreira da Silva , Oak Ridge National Laboratory, USA
Rosa Filgueira , Heriot-Watt University, Edinburgh, UK

Please, submit any inquiries to chairs@works-workshop.org

Scientific workflows have been almost universally used across scientific domains and have underpinned some of the most significant discoveries of the past several decades. Workflow management systems (WMSs) provide abstraction and automation which enable a broad range of researchers to easily define sophisticated computational processes and to then execute them efficiently on parallel and distributed computing systems. As workflows have been adopted by a number of scientific communities, they are becoming more complex and require more sophisticated workflow management capabilities. A workflow now can analyze terabyte-scale data sets, be composed of one million individual tasks, require coordination between heterogeneous tasks, manage tasks that execute for milliseconds to hours, and can process data streams, files, and data placed in object stores. The computations can be single core workloads, loosely coupled computations, or tightly all within a single workflow, and can run in dispersed computing platforms.

This workshop focuses on the many facets of scientific workflow management systems, ranging from actual execution to service management and the coordination and optimization of data, service, and job dependencies. The workshop covers a broad range of issues in the scientific workflow lifecycle that include: scientific workflows representation and enactment; workflow scheduling techniques to optimize the execution of the workflow on heterogeneous infrastructures; workflow enactment engines that need to deal with failures in the application and execution environment; and a number of computer science problems related to scientific workflows such as semantic technologies, compiler methods, scheduling and fault detection and tolerance.

Keynote

Carole Goble

FAIR Computational Workflows

Carole Goble, Dept of Computer Science, The University of Manchester, UK / ELIXIR-UK

The FAIR principles (Findable, Accessible, Interoperable, Reusable) [1] have laid a foundation for sharing and publishing digital assets, starting with data and now extending to all digital objects including software [2]. The use of computational workflows has accelerated in the past few years driven by the need for repetitive and scalable data processing, access to and exchange of processing know-how, and the desire for more reproducible (or at least transparent) and quality assured processing methods [3]. COVID-19 pandemic has highlighted the value of workflows [4]. Over 290 workflow systems are currently available, although a much smaller number are widely adopted [5]. As first class, publishable research objects, it seems natural to apply FAIR principles to workflows [6]. The FAIR data principles themselves originate from a desire to support automated data processing, by emphasizing machine accessibility of data and metadata. As workflows have a dual role as software and explicit method description, their FAIR properties draw from both data and software principles for descriptive metadata, software metrics, and versioning. However, workflows create unique challenges such as representing a complex lifecycle from specification to execution via a workflow system, through to the data created at the completion of the workflow. As workflows are chiefly concerned with the processing and creation of data they have an important role to play in ensuring and supporting data FAIRification.

The work on defining and improving the FAIRness of workflows has already started. A whole ecosystem of tools, guidelines and best practices are under development to reduce the time needed to adapt, reuse and extend existing scientific workflows. For example, a fundamental tenet of FAIR is the universal availability of machine processable metadata. The European EOSC-Life Cluster has developed a metadata framework for FAIR workflows based on schema.org, RO-Crate [7] and Common Workflow Language (CWL) [8], and uses the GA4GH TRS API for a standardised communication protocol to support Accessibility. It has developed and runs the WorkflowHub registry which uses both the framework and the protocol to support workflow Findability. EOSC-Life have made great efforts to on-board community workflow platforms such as Galaxy, snakemake, nextflow and CWL to carry and use FAIR metadata for discovery and reuse. As FAIR software needs to be usable and not just reusable, EOSC-Life has also developed services for, e.g. workflow testing (LifeMonitor), execution and benchmarking.

The Interoperability principle is the hardest to unpack for both data and software. For workflows, interoperability follows two threads: (i) supporting workflow system interoperability through workflow descriptions independent of the underlying system (e.g. CWL and WDL) and (ii) workflow component composability. Workflows are ideally composed of modular building blocks and these and the workflows themselves are expected to be reused, refactored, recycled and remixed. Thus, FAIR applies "all the way down": at the specification and execution level, and for the whole workflow and each of its components. Composability also relates to reuse – that is, adapting [2], a workflow or its component “can be understood, modified, built upon or incorporated into other workflow”. Reuse challenges also include being able to capture and then move workflow components, dependencies, and application environments in such a way as not to affect the resulting execution of the workflow. Interoperability and Reusability present important obligations on software developers to ensure that tools and datasets are workflow ready data with clean I/O programmatic interfaces, no usage restrictions, use of community data standards, and that they are simple to install and designed for portability. Workflow developers can be both data-FAIR, by using and making identifiers, licensing data outputs, tracking data provenance and so on, and workflow-FAIR by managing versions, providing test data, and sharing libraries of composable and reusable workflow “blocks” [9]. Communities are working on reviewing, validating and certifying canonical workflows.

While there are emerging tools for addressing different aspects of FAIR workflows, many challenges remain for describing, annotating, and exposing scientific workflows so that they can be found, understood and reused by other scientists. Further work is required to understand use cases for reuse and enable reuse in the same or different environments. The FAIR principles for workflows need to be community-agreed before metrics can be considered to determine whether a workflow is FAIR, whether a workflow repository or registry is FAIR, and whether it is possible to automatically review whether a workflow’s dataflow is FAIR. Community activism, perhaps led by the platforms and registries coming together in a community group like WorkflowsRI, is needed to define principles, policies and best practices for FAIR workflows and to standardize metadata representation and collection processes. In this talk I will present current work on FAIR principles, practices and services for computational workflows, using developments in the European EOSC-Life Workflow Collaboratory and the Bioexcel Centre of Excellence.

Bio. Carole Goble CBE FREng FBCS is a Professor of Computer Science at the University of Manchester, UK. She leads a team of Researchers, Research Software Engineers and Data Stewards. She has spent 25 years working in e-Science on computational workflows, reproducible science, open sharing, and knowledge and metadata management in a range of disciplines. She has led numerous e-Infrastructure projects and is currently the Head of Node of ELIXIR-UK, the national node of ELIXIR, the European Research Infrastructure for Life Sciences, as well as directing the digital infrastructure for IBISBA, the European Research Infrastructure for Industrial Biotechnology. Both these emphasise the use of computational workflows. Carole led the development of Taverna, one of the first open source computational workflow management systems and myExperiment.org, the first system agnostic web-based sharing platform for workflows and their related data. She was the scientific lead of the EU WF4Ever project which pioneered the notion of workflows as preservable and reproducible Research Objects. She currently co-leads developments in EOSC-Life Cluster Workflow Collaboratory (13 European Research Infrastructures in Biomedical Science lead by ELIXIR) including: the WorkflowHub.eu registry for workflows, the RO-Crate community initiative for packaging, exchanging and publishing workflows as Research Objects and the use of schema.org to mark up workflows. The tools of the Collaboratory are used by other projects from natural history collection digitisation to climate change modelling, and are part of the EU COVID data portal. Carole serves on the Advisory Board of the Common Workflow Language community and is a member of the WorkflowsRI community. In EOSC-Life she is leading developments on FAIR principles and practice applied to Workflows. Carole is also a co-founder of the UK’s Software Sustainability Institute and cares about quality research software and reproducibility. She is an author of the Nature paper proposing the seminal FAIR Principles for Scientific Data, contributes to the RDA FAIR4Research Software initiative, and actively nudges policy (OECD, G7, EU) to recognise software as a first class product of research.

  • [1] M.D. Wilkinson, M Dumontier et al, The FAIR Guiding Principles for scientific data management and stewardship, Scientific Data 3, (2016), DOI: 10.1038/sdata.2016.18
  • [2] D.S. Katz, M Gruenpeter, T Honeyman Taking a fresh look at FAIR for research software PATTERNS 2(2) (2021), DOI: 10.1016/j.patter.2021.100222
  • [3] T Reiter, P.T Brooks, L Irber, S.E.K Joslin, C.M Reid, C Scott, C.T Brown, N.T Pierce-Ward, Streamlining data-intensive biology with workflow systems, GigaScience, 10(1) 2021, giaa140, DOI: 10.1093/gigascience/giaa140
  • [4] W Maier, S Bray et al Freely accessible ready to use global infrastructure for SARS-CoV-2 monitoring bioRxiv (2021) doi: 10.1101/2021.03.25.437046
  • [5] L. Wratten, A. Wilm, J Göke Reproducible, scalable, and shareable analysis pipelines with bioinformatics workflow managers. Nat Methods (2021). DOI: 10.1038/s41592-021-01254-9
  • [6] C Goble, S Cohen-Boulakia, S Soiland-Reyes, D Garijo, Y Gil, M.R. Crusoe, K Peters, D Schober FAIR Computational Workflows Data Intelligence 2020 2:1-2, 108-121, DOI: 10.1162/dint_a_00033.
  • [7] S Soiland-Reyes, P Sefton, et al Packaging research artefacts with RO-Crate, arXiv:2108.06503v1
  • [8] M Crusoe et al Methods Included: Standardizing Computational Reuse and Portability with the Common Workflow Language, CACM (2021), DOI: 10.1145/3486897
  • [9] P Andrio, A Hospital, J Conejero et al. BioExcel Building Blocks, a software library for interoperable biomolecular simulation workflows. Sci Data 6, 169 (2019). DOI: 10.1038/s41597-019-0177-4

Workshop Program

Time Event
9:00-9:10 AM CST
7:00-7:10 AM PST
15:00-15:10 GMT
Welcome
Rosa Filgueira, Rafael Ferreira da Silva
9:10-10:00 AM CST
7:10-8:00 AM PST
15:10-16:00 GMT
Keynote: FAIR Computational Workflows
Prof. Carole Goble
10:00-10:30 AM CST
8:00-8:30 AM PST
16:00-16:30 GMT
Break
10:30-10:55 AM CST
8:30-8:55 AM PST
16:30-16:55 GMT
A Recommender System for Scientific Datasets and Analysis Pipelines
Mandana Mazaheri, Gregory Kiar, Tristan Glatard
10:55-11:20 AM CST
8:55-9:20 AM PST
16:55-17:20 GMT
Intelligent Resource Provisioning for Scientific Workflows and HPC
Benjamin T. Shealy, F. Alex Feltus, Melissa C. Smith
11:20-11:45 AM CST
9:20-9:45 AM PST
17:20-17:45 GMT
Not All Tasks Are Created Equal: Adaptive Resource Allocation for Heterogeneous Tasks in Dynamic Workflows
Thanh Son Phung, Douglas Thain, Kyle Chard, Logan Ward
11:45-11:55 AM CST
9:45-9:55 AM PST
17:45-17:55 GMT
Q&A Session 1
11:55-12:05 PM CST
9:55-10:05 AM PST
17:55-18:05 PM GMT
Learning Fundamental Workflow Concepts with EduWRENCH
Henri Casanova, Ryan Tanaka, William Koch, Rafael Ferreira da Silva
12:05-12:15 PM CST
10:05-10:15 AM PST
18:05-18:15 GMT
VisDict: Enhancing the Communication between Workflow Providers and User Communities via a Visual Dictionary
Sandra Gesing, Rafael Ferreira da Silva, Ewa Deelman, Michael Hildreth, Mary Ann McDowell, Natalie Meyers, Ian Taylor, Douglas Thain
12:15-12:25 PM CST
10:15-10:25 AM PST
18:15-18:25 GMT
Coordinating Dynamic Ensemble Calculations with libEnsemble
Stephen Hudson, John-Luke Navarro, Jeffrey Larson, Stefan Wild
12:25-12:30 PM CST
10:25-10:30 AM PST
18:25-18:30 GMT
Q&A Session 2
12:30-2:00 PM CST
10:30-noon PST
18:30-20:00 GMT
Break
2:00-2:25 PM CST
noon-12:25 PM PST
20:00-20:25 GMT
Dynamic Heterogeneous Task Specification and Execution for In Situ Workflows
Orcun Yildiz, Dmitriy Morozov, Bogdan Nicolae, Tom Peterka
2:25-2:50 PM CST
12:25-12:50 PM PST
20:25-20:50 GMT
An Adaptive Elasticity Policy For Staging Based In-Situ Processing
Zhe Wang, Matthieu Dorier, Pradeep Subedi, Philip E. Davis, Manish Parashar
2:50-3:00 PM CST
12:50-1:00 PM PST
20:50-21:00 GMT
Q&A Session 3
3:00-3:30 PM CST
1:00-1:30 PM PST
21:00-21:30 GMT
Break
3:30-3:55 PM CST
1:30-1:55 PM PST
21:30-21:55 GMT
The benefits of prefetching for large-scale cloud-based neuroimaging analysis workflows
Valerie Hayot-Sasson, Tristan Glatard, Ariel Rokem
3:55-4:20 PM CST
1:55-2:20 PM PST
21:55-22:20 GMT
ExaWorks: Workflows for Exascale
Aymen Al-Saadi, Dong Ahn, Yadu Babuji, Kyle Chard, James Corbett, Mihael Hategan, Stephen Herbein, Shantenu Jha, Daniel Laney, Andre Merzky, Todd Munson, Michael Salim, Mikhail Titov, Matteo Turilli, Thomas Uram, Justin Wozniak
4:20-4:25 PM CST
2:20-2:25 PM PST
22:20-22:25 GMT
Q&A Session 4
4:25-4:35 PM CST
2:25-2:35 PM PST
22:25-22:35 GMT
A Lightweight GPU Monitoring Extension for Pegasus Kickstart
Georgios Papadimitriou, Ewa Deelman
4:35-5:00 PM CST
2:35-3:00 PM PST
22:35-23:00 GMT
A Performance Characterization of Scientific Machine Learning Workflows
Patrycja Krawczuk, George Papadimitriou, Ryan Tanaka, Tu Mai Anh Do, Srujana Subramanya, Shubham Nagarkar, Aditi Jain, Kelsie Lam, Anirban Mandal, Loïc Pottier, Ewa Deelman
5:00-5:25 PM CST
3:00-3:25 PM PST
23:00-23:25 GMT
Science Capsule: Towards Sharing and Reproducibility of Scientific Workflows
Devarshi Ghoshal, Ludovico Bianchi, Abdelilah Essiari, Drew Paine, Sarah S. Poon, Michael Beach, Alpha T. N'Diaye, Patrick Huck, Lavanya Ramakrishnan
5:25-5:30 PM CST
3:25-3:30 PM PST
23:25-23:30 GMT
Q&A Session 5

Important Dates

  • Full paper and abstract deadline
    August 15, 2021 August 27, 2021 (final extension)
  • Paper acceptance notification
    September 24, 2021
  • E-copyright registration completed by authors
    October 24, 2021
  • Camera-ready deadline
    October 4, 2021
  • Consent and release form
    October 24, 2021
  • Video presentations
    October 24, 2021
  • Workshop
    November 15, 2021
  • All deadlines are Anywhere on Earth (AoE)

Paper Submission

WORKS21 welcomes original submissions in a range of areas, including but not limited to:

  • Big Data analytics workflows, AI workflows
  • Data-driven workflow processing, stream-based workflows
  • Workflow composition, tools, orchestrators, and languages
  • Workflow execution in distributed environments (including HPC, clouds, and grids)
  • FAIR computational workflows
  • Dynamic data dependent workflow systems solutions
  • Exascale computing with workflows
  • In Situ Data Analytics Workflows
  • Human-in-the-loop workflows
  • Workflow fault-tolerance and recovery techniques
  • Workflow user environments, including portals
  • Workflow applications and their requirements
  • Adaptive workflows
  • Workflow optimizations (including scheduling and energy efficiency)
  • Performance analysis of workflows
  • Workflow provenance

Papers should present original research and should provide sufficient background material to make them accessible to the broader community.

There will be two forms of presentations:

  • Talks - Full papers (up to 8 pages)
    Describing a research contribution in the topics listed above.
  • Lightning Talks - Abstracts (up to 1 page)
    Describing a novel tool and or scientific workflow.

Submission of a full paper may result in a talk, submission of an abstract may result in a lightning talk. Presenters of full papers will be given a 20-minute time slot (plus 5 minutes for questions) to provide a summary and update to their research work. Presenters of abstracts will be given a 10-minute time slot (including 3 minutes for questions) to present a novel tool or scientific workflow.

Submission Guidelines:

  • Full papers
    Submissions are limited to 8 pages. The 8-page limit includes figures, tables, appendices, and references.
  • Abstracts
    Submissions are limited to 1 page (excluding references). The 1-page limit includes the description of a novel tool/science workflow, and a link of a repository in which the novel source-code of the tool is stored. This repository will need to specify all the instructions necessary to execute the tool, so reviewers can test it. Abstracts will be compiled into a single paper and published as part of the workshop proceedings.
All submitted papers (full papers and abstracts) will undergo a rigorous review process and each will have at least three reviews by members of the program committee. Papers will be accepted based on their technical contributions.

Papers and abstracts should be submitted in the IEEE format (see https://www.ieee.org/conferences/publishing/templates.html).
WORKS papers this year will be published in cooperation with TCHPC and they will be available from IEEE digital repository.

Organization

Program Committee Chairs

Rafael Ferreira da Silva

Oak Ridge National Laboratory, USA

Rosa Filgueira

Heriot-Watt University, Edinburgh, UK

General Chair

Ian Taylor

Cardiff University, UK
University of Notre Dame, USA

Publicity Chair

Tainã Coleman

University of Southern California, USA


Steering Committee

David Abramson

University of Queensland, Australia

Malcolm Atkinson

University of Edinburgh, UK

Ewa Deelman

University of Southern California, USA

Michela Taufer

University of Tennessee, USA

Program Committee

Pinar Alper
King's College London
Rosa M. Badia
Barcelona Supercomputing Center
Khalid Belhajjame
Universit. Paris-Dauphine
Silvina Caino-Lores
University of Tennessee
Henri Casanova
University of Hawaii at Manoa
Kyle Chard
University of Chicago
Tainã Coleman
University of Southern California
Michael R. Crusoe
Common Workflow Language
Paolo Di Tommaso
Seqera Labs
Thomas Fahringer
University of Innsbruck
Sandra Gesing
University of Chicago
Tristan Glatard
Concordia University
Shantenu Jha
Rutgers University
Daniel S. Katz
University of Illinois at Urbana-Champaign
Maciej Malawski
AGH UST
Anirban Mandal
RENCI
Marta Mattoso
UFRJ
Raffaele Montella
University of Naples Parthenope
Jayson Luc Peterson
Lawrence Livermore National Laboratory
Loïc Pottier
University of Southern California
Radu Prodan
University of Klagenfurt
Lavanya Ramakrishnan
Lawrence Berkeley National Laboratory
Tong Shu
Southern Illinois University
Stian Soiland-Reys
University of Manchester
Renan Souza
IBM Research
Frédéric Suter
CNRS, INRIA
Domenico Talia
University of Calabria
Douglas Thain
University of Notre Dame
Matthew Wolf
Oak Ridge National Laboratory
Justin Wozniak
Argonne National Laboratory
Chase Q. Wu
New Jersey Institute of Technology