WORKS 2022

17th Workshop on Workflows in Support of Large-Scale Science
November 14, 2022 — 8:30am-noon CT — Room D222
Dallas, TX, USA (and virtual) In conjunction with


Proceedings by

WORKS 2022 focuses on the many facets of scientific workflow management systems, ranging from actual execution to service management and the coordination and optimization of data, service, and job dependencies. The workshop covers a broad range of issues in the scientific workflow lifecycle that include: scientific workflows representation and enactment; workflow scheduling techniques to optimize the execution on heterogeneous infrastructures; workflow enactment engines that deal with failures in the application and infrastructure; and a number of computer science problems related to scientific workflows such as semantic technologies, compiler methods, fault tolerance, etc.

Workshop Program

Time Event
8:30am Welcome and Introductions
Rosa Filgueira and Rafael Ferreira da Silva
8:30am-9:00am Invited Talk: Making easier the development and deployment of application workflows with eFlows4HPC
Rosa M. Badia
9:00am-9:12am Paper: RADICAL-Pilot and Parsl: Executing Heterogeneous Workflows on HPC Platforms
Alsaadi, Ward, Merzky, Chard, Foster, Jha, Turilli
9:12am-9:24am Paper: Automatic, Efficient, and Scalable Provenance Registration for FAIR HPC Workflows
Sirvent, Conejero, Lordan, Ejarque, Rodríguez-Navas, Fernández, Capella-Gutiérrez, Badia
9:24am-9:36am Paper: Challenges of Provenance in Scientific Workflow Management Systems
Alam, Roy
9:36am-9:48am Paper: A Domain-Specific Provenance Query Composition Environment for Scientific Workflows
Hossain, Roy, Roy, Schneider
9:48am-10:00am Paper: Workflow Anomaly Detection with Graph Neural Networks
Jin, Raghavan, Papadimitriou, Wang, Mandal, Krawczuk, Pottier, Kiran, Deelman, Balaprakash
10:00am-10:30am Break
10:30am-10:36am Lightning Talk: Modeling Data Integrity Threats for Scientific Workflows Using OSCRP and MITRE ATT&CK®
Abhinit, Adams, Chase, Mandal, Xin, Vahi, Rynge, Deelman
10:36am-10:42am Lightning Talk: RECUP: A (Meta)data Framework for Reproducing Hybrid Workflows with FAIR
Pouchard, Islam, Nicolae, Ross
10:42am-10:48am Lightning Talk: Recommending Tools and Sub-Workflows for Scientific Workflow Management Systems
Alam, Roy, Serebrenik
10:48am-10:54am Lightning Talk: libEnsemble: Flexible Workflows through Dynamic Assignment of Workers and Resources
Hudson, Larson, Navarro, Wild
10:54am-11:00am Lightning Talk: HyperShell v2: A Better Workflow Automation Tool for Many-Task Computing
Lentner, Gorenstein
11:00am-11:12am Paper: Co-Scheduling Ensembles of In Situ Workflows
Do, Pottier, Ferreira da Silva, Suter, Caíno-Lores, Taufer, Deelman
11:12am-11:24am Paper: Events as a Basis for Workflow Scheduling
Marchant
11:24am-11:36am Paper: An Automated Cryo-EM Computational Environment on the HPC System Using Pegasus WMS
Osinski, Rynge, Vahi, Hong, Chu, Sul, Deelman, Kim
11:36am-11:48am Paper: Cross-Facility Workflows: Case Studies with Active Experiments
Tyler, Knop, Bard, Nugent
11:48am-noon Paper: CardioHPC: Serverless Approaches for Real-Time Heart Monitoring of Thousands of Patients
Gusev, Ristov, Amza, Hohenegger, Prodan, Mileski, Gushev, Temelkov

Keynote

Rosa M. Badia

Barcelona Supercomputing Center, Spain



Making easier the development and deployment of application workflows with eFlows4HPC

While distributed computing infrastructures are becoming increasingly complex, the user community provides more complex application workflows to leverage them. In addition, current trends aim to use data analytics and artificial intelligence combined with HPC modeling and simulation. However, the programming models and tools are different in these fields, and there is a need for methodologies that enable the development of workflows that combine HPC software, data analytics, and artificial intelligence. The eFlows4HPC project aims at providing a workflow software stack that fulfills this need. The project is also developing the HPC Workflows as a Service (HPCWaaS) methodology that aims at providing tools to simplify the development, deployment, execution, and reuse of workflows. The project showcases its advances with three application Pillars with industrial and social relevance: manufacturing, climate, and urgent computing for natural hazards. The talk will present the actual progress and findings of the project.

Rosa M. Badia holds a PhD on Computer Science (1994) from the Technical University of Catalonia (UPC). She is the manager of the Workflows and Distributed Computing research group at the Barcelona Supercomputing Center (BSC). She has made significant contributions to Parallel programming models for multicore and distributed computing due to her contribution to task-based programming models during the last 15 years. The research group focuses on PyCOMPSs/COMPSs, a parallel task-based programming distributed computing, and its application to the development of large heterogeneous workflows that combine HPC, Big Data, and Machine Learning. Dr Badia has published nearly 200 papers in international conferences and journals on the topics of her research. She has been active in projects funded by the European Commission in contracts with industry. She is a member of HiPEAC Network of Excellence. She received the Euro-Par Achievement Award 2019 for her contributions to parallel processing, the DonaTIC award, category Academia/Researcher in 2019, and the HPDC Achievement Award 2021 for her innovations in parallel task-based programming models, workflow applications and systems, and leadership in the high-performance computing research community. Rosa Badia is the IP of eFlows4HPC.

Important Dates

  • August 15 August 22, 2022 (final extension)

    Papers and Abstracts Submission
  • September 9 September 14, 2022

    Paper and Abstract Acceptance Notifications
  • September 30 October 10, 2022

    Camera-ready Submissions
  • November 14, 2022

    Workshop

All deadlines are Anywhere on Earth (AoE).

Call for Papers

Scientific workflows have been almost universally used across scientific domains and have underpinned some of the most significant discoveries of the past several decades. Workflow management systems (WMSs) provide abstraction and automation which enable a broad range of researchers to easily define sophisticated computational processes and to then execute them efficiently on parallel and distributed computing systems. As workflows have been adopted by a number of scientific communities, they are becoming more complex and require more sophisticated workflow management capabilities. A workflow now can analyze terabyte-scale data sets, be composed of one million individual tasks, require coordination between heterogeneous tasks, manage tasks that execute for milliseconds to hours, and can process data streams, files, and data placed in object stores. The computations can be single core workloads, loosely coupled computations, or tightly all within a single workflow, and can run in dispersed computing platforms.

This workshop focuses on the many facets of scientific workflow management systems, ranging from actual execution to service management and the coordination and optimization of data, service, and job dependencies. The workshop covers a broad range of issues in the scientific workflow lifecycle that include: scientific workflows representation and enactment; workflow scheduling techniques to optimize the execution of the workflow on heterogeneous infrastructures; workflow enactment engines that need to deal with failures in the application and execution environment; and a number of computer science problems related to scientific workflows such as semantic technologies, compiler methods, scheduling and fault detection and tolerance.

WORKS22 will be held in conjunction with the SuperComputing (SC22), Dallas, Texas, USA, at Kay Bailey Hutchison Convention Center Dallas.

Topics for the workshop

WORKS22 welcomes original submissions in a range of areas, including but not limited to:

  • Big Data analytics workflows, AI workflows
  • Data-driven workflow processing, stream-based workflows
  • Workflow composition, tools, orchestrators, and languages
  • Workflow execution in distributed environments (including HPC, clouds, and grids)
  • FAIR computational workflows
  • Dynamic data dependent workflow systems solutions
  • Exascale computing with workflows
  • In Situ Data Analytics Workflows
  • Human-in-the-loop workflows
  • Workflow fault-tolerance and recovery techniques
  • Workflow user environments, including portals
  • Workflow applications and their requirements
  • Adaptive workflows
  • Workflow optimizations (including scheduling and energy efficiency)
  • Performance analysis of workflows
  • Workflow provenance
  • Registers for workflows
  • Serverless workflows and serverless orchestration

There will be two forms of presentations:

  • Talks - Full papers (up to 8 pages) describing a research contribution in the topics listed above.
  • Lightning Talks - Abstracts (up to 2 pages) describing a novel tool, scientific workflow, or concept.

Submission of a full paper may result in a talk, submission of an abstract may result in a lightning talk.

Proceedings Publication

Accepted papers from the workshop will be published by the IEEE Computer Society Press, USA and made available online through the IEEE Digital Library.

Paper Submission Guidelines

  • Full papers: Submissions are limited to 8 pages. The 8-page limit includes figures, tables, appendices, and references.
  • Abstracts: Submissions are limited to 2 pages (including references). The 2-pages limit includes the description of a novel tool/science workflow/concept, and a link of a repository in which the novel source-code of the tool is stored. This repository will need to specify all the instructions necessary to execute the tool, so reviewers can test it. Abstracts will be compiled into a single paper and published as part of the workshop proceedings.

The format of the paper should be of double column text using single spaced 10 point size on 8.5 x 11 inch pages, as per IEEE 8.5 x 11 manuscript guidelines. Templates are available from this link.

Submit Your Abstract or Paper

Organization

Program Committee Chairs

Rosa Filgueira

University of St Andrews, UK

Rafael Ferreira da Silva

Oak Ridge National Laboratory, USA

General Chair

Ian J. Taylor

SIMBA Chain, USA

Steering Committee

David Abramson

University of Queensland, Australia

Malcolm Atkinson

University of Edinburgh, UK

Ewa Deelman

University of Southern California, USA

Michela Taufer

University of Tennessee, USA

Program Committee

Rosa M. Badia

Barcelona Supercomputing Center

Henri Casanova

University of Hawaii at Manoa

Kyle Chard

University of Chicago

Tainã Coleman

University of Southern California

Michael R. Crusoe

Common Workflow Language

Frank Di Natale

Nvidia

Paolo Di Tommaso

Seqera Labs

Thomas Fahringer

University of Innsbruck

Daniel Garijo

Universidad Politécnica de Madrid

Sandra Gesing

University of Illinois Chicago

Daniel S. Katz

University of Illinois at Urbana-Champaign

Ketan C. Maheshwari

Oak Ridge National Laboratory

Maciej Malawski

AGH UST

Marta Mattoso

UFRJ

Raffaele Montella

University of Naples Parthenope

Daniel de Oliveira

UFF

J. Luc Peterson

Lawrence Livermore National Laboratory

Loïc Pottier

Lawrence Livermore National Laboratory

Lavanya Ramakrishnan

Lawrence Berkeley National Laboratory

Tyler Skluzacek

Oak Ridge National Laboratory

Frédéric Suter

Oak Ridge National Laboratory

Douglas Thain

University of Notre Dame

Sean R. Wilkinson

Oak Ridge National Laboratory

Justin Wozniak

Argonne National Laboratory