WORKS 2023

18th Workshop on Workflows in Support of Large-Scale Science
November 12-13
Denver, CO, USA (and virtual) In conjunction with


Proceedings by

WORKS 2023 focuses on the many facets of scientific workflow management systems, ranging from actual execution to service management and the coordination and optimization of data, service, and job dependencies. The workshop covers a broad range of issues in the scientific workflow lifecycle that include: scientific workflows representation and enactment; workflow scheduling techniques to optimize the execution on heterogeneous infrastructures; provisioning workflows on different kinds of infrastructures; workflow enactment engines that deal with failures in the application and infrastructure; and computer science problems related to scientific workflows such as semantic technologies, compiler methods, fault tolerance, etc.

Workshop Program - Part I (Sunday 12, 2pm to 5:30pm, Rooms 501-502)

Time Event
2:00pm-2:10pm Welcome - Part I
Silvina Caino-Lores, Anirban Mandal
2:10pm-2:42pm Invited Talk: Workflow Building Blocks: The Success Story of Environmental Modeling, HPC, and AI for Predicting Farmed Seafood Bacteria Contamination
Raffaele Montella
2:42pm-3:00pm Paper: End-to-end Workflows for Climate Science: Integrating HPC Simulations, Big Data Processing and Machine Learning
Elia, Scardigno, Ejarque, D’Anca, Accarino, Scoccimarro, Donno, Peano, Immorlano, Aloisio
3:00pm-3:30pm Break
3:30pm-3:48pm Paper: Scale Composite BaaS Services With AFCL Workflows
Larcher, Ristov
3:48pm-4:06pm Paper: A Systematic Mapping Study of Italian Research on Workflows
Aldinucci, Baralis, Cardellini, Colonnelli, Danelutto, Decherchi, Di Modica, Ferrucci, Gribaudo, Iannone, Lapegna, Medic, Muscianisi, Righetti, Sciacca, Tonellotto, Tortonesi, Trunfio, Vardanega
4:06pm-4:16pm Lightning Talk: Transcriptomics Atlas Pipeline: Cloud vs HPC
Kica, Lichołai, Malawski
4:16pm-4:26pm Lightning Talk: Patterns and Anti-Patterns in Migrating from Legacy Workflows to Workflow Management Systems
Cassol, Froula, Kirton, Sul, Melara, Kothadia, Player, Sarrafan, Chan, Fagnan
4:26pm-4:44pm Paper: Accelerating Data-Intensive Seismic Research Through Parallel Workflow Optimization and Federated Cyberinfrastructure
Adair, Rodero, Parashar, Melgar
4:44pm-5:02pm Paper: Laminar: A New Serverless Stream-based Framework with Semantic Code Search and Code Completion
Zahra, Li, Filgueira
5:02pm-5:20pm Paper: Optimization towards Efficiency and Stateful of dispel4py
Liang, Zhang, Yang, Heinis, Filgueira
5:20pm-5:30pm Wrap Up - Part I
Silvina Caino-Lores, Anirban Mandal

Workshop Program - Part II (Monday 13, 9am to 12:30pm, Rooms 704-706)

Time Event
9:00am-9:05am Welcome - Part II
Silvina Caino-Lores, Anirban Mandal
9:05am-9:37am Invited Talk: FAIRIST of Them All: Meeting Researchers Where They Are With Just-in-Time, FAIR Implementation Advice
Christine Kirkpatrick
9:37am-9:55am Paper: A data science pipeline synchronisation method for edge-fog-cloud continuum
Sanchez-Gallegos, Gonzalez-Compean, Carretero, Marin-Castro
9:55am-10:25am Break
10:25am-10:43am Paper: TaskVine: Managing In-Cluster Storage for High-Throughput Data Intensive Workflows
Sly-Delgado, Phung, Thomas, Simonetti, Hennessee, Tovar, Thain
10:43am-10:53am Lightning Talk: Leveraging Large Language Models to Build and Execute Computational Workflows
Duque, Syed, Day, Berry, Katz, Kindratenko
10:53am-11:11am Paper: Delivering Rules-Based Workflows for Science
Marchant, Blomqvist, Jensen, Lilholm, Nørgaard
11:11am-11:29am Paper: Julia as a Unifying End-to-End Workflow Language on the Frontier Exascale System
Godoy, Valero-Lara, Anderson, Lee, Gainaru, Ferreira da Silva, Vetter
11:29am-11:39am Lightning Talk: Scaling on Frontier: Uncertainty Quantification Workflow Applications using ExaWorks to Enable Full System Utilization
Titov, Carson, Rolchigo, Coleman, Belak, Bement, Laney, Turilli, Jha
11:39am-11:57am Paper: Distributed Data Locality-Aware Job Allocation
Markovic, Kolovos, Soares Indrusiak
11:57am-12:15pm Paper: Fluxion: A Scalable Graph-Based Resource Model for HPC Scheduling Challenges
Patki, Ahn, Milroy, Yeom, Garlick, Grondona, Herbein, Scogland
12:15pm-12:25pm Lightning Talk: The Common Workflow Scheduler Interface: Status Quo and Future Plans
Lehmann, Bader, Thamsen, Leser
12:25pm-12:30pm Wrap Up - Part II
Silvina Caino-Lores, Anirban Mandal

Invited Speakers

Raffaele Montella

University of Naples “Parthenope”, Italy



Workflow Building Blocks: The Success Story of Environmental Modeling, HPC, and AI for Predicting Farmed Seafood Bacteria Contamination

Scientific workflows processing enormous amounts of data using distributed HPC systems or on-demand computational resources are a solid and reliable paradigm in data science. The orchestration of environmental models to produce simulations or forecasts is a more widespread routine production workflow application. This presentation concerns our vision of workflows as building blocks for environmental applications, combining numerical and artificial intelligence models to produce augmented environmental forecasts and predictions. DagOnStar is the workflow engine developed at the HPSC SmartLab of the University of Naples "Parthenope" for orchestrating environmental models used by the Center for Monitoring and Modeling Marine and Atmosphere (CMMMA) applications to orchestrate the weather and marine forecast production. The Center runs a routinary workflow application to predict the contamination by Escherichia Coli (E. Coli) in farmed mussels, augmenting the forecasted pollutant transport and diffusion (WaComM++ model) with an artificial intelligence model (AIQUAM model) trained with microbiological measurements. The first assessment and evaluation of the system demonstrate that the workflow application can predict E. Coli presence with an accuracy of more than 90%.

Raffaele Montella is an Associate Professor with tenure in Computer Science at the Department of Science and Technologies (DiST), University of Naples “Parthenope" (UNP), Italy. He got his degree in (Marine) Environmental Science at the University of Naples “Parthenope" in 1998. He defended his Ph.D. thesis on "Environmental modeling and Grid Computing techniques" earning a Ph.D. in Marine Science and Engineering at the University of Naples "Federico II". He leads the High-Performance Scientific Computing (HPSC) Laboratory and the IT infrastructure of the UNP Center for Marine and Atmosphere Monitoring and Modeling (CMMMA). His main research topics and scientific production are focused on: tools for high-performance computing, cloud computing, and GPUs with applications in the field of computational environmental science (multi-dimensional geo-referenced big data, distributed computing for modeling, and scientific workflows and science gateways) leveraging on his previous (and still ongoing) experiences in embedded, mobile, wearable, pervasive computing, and Internet of Things.

Christine Kirkpatrick

San Diego Supercomputing Center, USA



FAIRIST of them all: Meeting researchers where they are with just-in-time, FAIR implementation advice

Intellectual freedom, curiosity, and creativity are qualities of the academic landscape that appeal to many researchers. But a blank page in the wrong context can halt creativity, such as creating a data management and sharing plan. A goal for research support staff, as well as for researchers themselves, is to lower time spent on the mechanics of research to make more time for open-ended discovery. For data-driven science, this includes collecting and processing data so that one can find and combine research objects later. It also means preparing research objects, such as data, software, and workflows, for later reuse by others. The FAIR principles provide a conceptual framework for comprehensively ensuring research assets are accessible for reuse. Currently researchers apply the FAIR practices as best they can, based on community practices, lessons learned on the job, and other mentorship they may have received. This talk will explore how FAIR implementation practices – or any other practice that aids data management and sharing, can be provided to researchers and customized to their specific research tasks. Research workflows can be improved through new ways of sharing hard won knowledge, and through processes that allow for peer assessment. These data sources can be repurposed in existing tools or through new interfaces, such as the FAIR+ Implementation Survey Tool (FAIRIST).

Christine Kirkpatrick leads the San Diego Supercomputer Center’s (SDSC) Research Data Services division, which manages large-scale infrastructure, networking, and services for research projects of regional and national scope. Her duties also include a leadership role on the Schmidt Futures Foundation and NSF-funded Open Storage Network and as leader of the Data Core for the NIH-funded Metabolomics Workbench, a national data repository for metabolomics studies. Her research in computer science has centered on improving machine learning processing through research data management techniques. In addition to being PI of the EarthCube Office (ECO), Kirkpatrick founded the US GO FAIR Office, is PI of the West Big Data Innovation Hub, and Co-PI on an NSF Accelnet: Designing a Water, Data, and Systems Science Network of Networks to Catalyze Transboundary Groundwater Resiliency Research. She serves as the Secretary General of the International Science Council's Committee on Data (CODATA), co-Chairs the FAIR Digital Object Forum, is on the external Advisory Board for the European Open Science Cloud (EOSC) Nordic, and the National Academies of Sciences’ U.S. National Committee for the Committee on Data.

Important Dates

  • August 11 August 16, 2023 (final extension)

    Papers and Abstracts Submission
  • September 8, 2023

    Paper and Abstract Acceptance Notifications
  • September 29, 2023

    Camera-ready Submissions
  • November 12-13, 2023

    Workshop

All deadlines are Anywhere on Earth (AoE).

Call for Papers

Scientific workflows have been used almost universally across scientific domains and have underpinned some of the most significant discoveries of the past several decades. Workflow management systems (WMSs) provide abstraction and automation, which enable a broad range of researchers to easily define sophisticated computational processes and to then execute them efficiently on parallel and distributed computing systems. As workflows have been adopted by a number of scientific communities, they are becoming more complex and require more sophisticated workflow management capabilities. A workflow now can analyze terabyte-scale data sets, be composed of one million individual tasks, require coordination between heterogeneous tasks, manage tasks that execute for milliseconds to hours, and can process data streams, files, and data placed in object stores. The computations can be single core workloads, loosely coupled computations, or tightly all within a single workflow, and can run in dispersed computing platforms, from edge to core resources.

This workshop focuses on the many facets of scientific workflow management systems, ranging from actual execution to service management and the coordination and optimization of data, service, and job dependencies. The workshop covers a broad range of issues in the scientific workflow lifecycle that include: scientific workflows representation and enactment; workflow scheduling techniques to optimize the execution of the workflow on heterogeneous infrastructures; workflow enactment engines that need to deal with failures in the application and execution environment; and a number of computer science problems related to scientific workflows such as semantic technologies, compiler methods, scheduling and fault detection and tolerance.

WORKS23 will be held in conjunction with SuperComputing (SC23), Denver, Colorado, USA, at the Colorado Convention Center in Denver.

Topics for the workshop

WORKS23 welcomes original submissions in a range of areas, including but not limited to:

  • Big Data analytics workflows, AI/ML workflows
  • Data-driven workflow processing and stream-based workflows
  • Workflow composition, tools, orchestrators, and languages
  • Workflow execution in distributed environments (including edge, grid, HPC, clusters, and clouds)
  • Workflows integrating emerging technologies (e.g., quantum, neuromorphic)
  • FAIR computational workflows
  • Dynamic data dependent workflow systems solutions
  • Exascale computing with workflows
  • In situ data analytics workflows
  • Interactive/human-in-the-loop workflows and steering
  • Workflow fault-tolerance and recovery techniques
  • Workflow user environments, including portals
  • Workflow applications and their requirements
  • Adaptive workflows
  • Resource provisioning for workflows (elasticity, control, and management)
  • Workflow optimizations (including scheduling and energy efficiency)
  • Performance analysis of workflows
  • Workflow debugging
  • Workflow provenance
  • Serverless workflows and serverless orchestration

There will be two forms of presentations:

  • Talks - Full papers (up to 12 pages) describing a research contribution in the topics listed above.
  • Lightning Talks - Abstracts (up to 4 pages) describing a novel tool, scientific workflow, or concept.

Submission of a full paper may result in a talk. Submission of an abstract may result in a lightning talk. Each submission will receive at least three reviews from the workshop program committee.

Proceedings Publication

Accepted papers from the workshop will be published in the SC Workshops Proceedings volume and made available online.

Paper Submission Guidelines

  • Full papers: Submissions are limited to 12 pages. The 12-page limit includes figures, tables, appendices, and references.
  • Abstracts: Submissions are limited to 4 pages (including references). The 4-page limit includes the description of a novel tool/science workflow/concept, and a link of a repository in which the novel source-code of the tool is stored. This repository will need to specify all the instructions necessary to execute the tool, so reviewers can test it. Abstracts will be compiled into a single paper and published as part of the workshop proceedings.

The format of the paper should follow ACM manuscript guidelines. Templates are available from this link.
For Latex users, version 1.90 (last update April 4, 2023) is the latest template, and please use the “sigconf” option.

Submit Your Abstract or Paper

Organization

General Chairs

Silvina Caino-Lores

French Institute for Research in Computer Science and Automation (INRIA), France

Anirban Mandal

Renaissance Computing Institute (RENCI), UNC Chapel Hill, USA

Steering Committee

David Abramson

University of Queensland, Australia

Malcolm Atkinson

University of Edinburgh, UK

Ewa Deelman

University of Southern California, USA

Michela Taufer

University of Tennessee, USA

Program Committee

Ilkay Altintas

San Diego Supercomputing Center

Ivona Brandic

Technical University of Vienna

Jesus Carretero

University Carlos III of Madrid

Alberto Cascajo

University Carlos III of Madrid

Kyle Chard

University of Chicago

Tainã Coleman

University of Southern California

Ewa Deelman

University of Southern California

Vincenzo de Maio

Technical University of Vienna

Frank di Natale

Nvidia

Rosa Filgueira

University of St. Andrews

Daniel Garijo

Universidad Politécnica de Madrid

Sandra Gesing

University of Illinois Chicago

Tristan Glatard

Concordia University

William Godoy

Oak Ridge National Laboratory

Shantenu Jha

Rutgers University

Daniel S. Katz

University of Illinois at Urbana-Champaign

Tamas Kiss

University of Westminster

Jakob Luettgau

University of Tennessee

Ketan C. Maheshwari

Oak Ridge National Laboratory

Paolo Missier

Newcastle University

Raffaele Montella

University of Naples Parthenope

Bogdan Nicolae

Argonne National Laboratory

Paola Olaya

University of Tennessee

Tom Peterka

Argonne National Laboratory

Loïc Pottier

Lawrence Livermore National Laboratory

Radu Prodan

University of Klagenfurt

Omer Rana

Cardiff University

Ivan Rodero

University of Utah

Daniel Rosendo

French Institute for Research in Digital Science and Technology (Inria)

Raul Sirvent

Barcelona Supercomputing Center

Tyler Skluzacek

Oak Ridge National Laboratory

Renan Souza

Oak Ridge National Laboratory

Frédéric Suter

Oak Ridge National Laboratory

Domenico Talia

University of Calabria

Francois Tessier

French Institute for Research in Digital Science and Technology (Inria)

Douglas Thain

University of Notre Dame

Rafael Tolosana-Calasanz

Universidad de Zaragoza

Sean R. Wilkinson

Oak Ridge National Laboratory

Justin Wozniak

Argonne National Laboratory

Orcun Yildiz

Argonne National Laboratory

Contact

For information please direct your inquiries to sc-ws-works@info.supercomputing.org, or contact the workshop chairs: