18th Workshop on Workflows in Support of Large-Scale Science
November 12-13
Denver, CO, USA (and virtual)
In conjunction with
Proceedings by
WORKS 2023 focuses on the many facets of scientific workflow management systems, ranging from actual execution to service management and the coordination and optimization of data, service, and job dependencies. The workshop covers a broad range of issues in the scientific workflow lifecycle that include: scientific workflows representation and enactment; workflow scheduling techniques to optimize the execution on heterogeneous infrastructures; provisioning workflows on different kinds of infrastructures; workflow enactment engines that deal with failures in the application and infrastructure; and computer science problems related to scientific workflows such as semantic technologies, compiler methods, fault tolerance, etc.
Time | Event |
---|---|
2:00pm-2:10pm | Welcome - Part I
Silvina Caino-Lores, Anirban Mandal |
2:10pm-2:42pm | Invited Talk: Workflow Building Blocks: The Success Story of Environmental Modeling, HPC, and AI for Predicting Farmed Seafood Bacteria Contamination
Raffaele Montella |
2:42pm-3:00pm |
Paper:
End-to-end Workflows for Climate Science: Integrating HPC Simulations, Big Data Processing and Machine Learning
Elia, Scardigno, Ejarque, D’Anca, Accarino, Scoccimarro, Donno, Peano, Immorlano, Aloisio |
3:00pm-3:30pm | Break |
3:30pm-3:48pm |
Paper:
Scale Composite BaaS Services With AFCL Workflows
Larcher, Ristov |
3:48pm-4:06pm |
Paper:
A Systematic Mapping Study of Italian Research on Workflows
Aldinucci, Baralis, Cardellini, Colonnelli, Danelutto, Decherchi, Di Modica, Ferrucci, Gribaudo, Iannone, Lapegna, Medic, Muscianisi, Righetti, Sciacca, Tonellotto, Tortonesi, Trunfio, Vardanega |
4:06pm-4:16pm |
Lightning Talk:
Transcriptomics Atlas Pipeline: Cloud vs HPC
Kica, Lichołai, Malawski |
4:16pm-4:26pm |
Lightning Talk:
Patterns and Anti-Patterns in Migrating from Legacy Workflows to Workflow Management Systems
Cassol, Froula, Kirton, Sul, Melara, Kothadia, Player, Sarrafan, Chan, Fagnan |
4:26pm-4:44pm |
Paper:
Accelerating Data-Intensive Seismic Research Through Parallel Workflow Optimization and Federated Cyberinfrastructure
Adair, Rodero, Parashar, Melgar |
4:44pm-5:02pm |
Paper:
Laminar: A New Serverless Stream-based Framework with Semantic Code Search and Code Completion
Zahra, Li, Filgueira |
5:02pm-5:20pm |
Paper:
Optimization towards Efficiency and Stateful of dispel4py
Liang, Zhang, Yang, Heinis, Filgueira |
5:20pm-5:30pm | Wrap Up - Part I
Silvina Caino-Lores, Anirban Mandal |
Time | Event |
---|---|
9:00am-9:05am | Welcome - Part II
Silvina Caino-Lores, Anirban Mandal |
9:05am-9:37am | Invited Talk: FAIRIST of Them All: Meeting Researchers Where They Are With Just-in-Time, FAIR Implementation Advice
Christine Kirkpatrick |
9:37am-9:55am |
Paper:
A data science pipeline synchronisation method for edge-fog-cloud continuum
Sanchez-Gallegos, Gonzalez-Compean, Carretero, Marin-Castro |
9:55am-10:25am | Break |
10:25am-10:43am |
Paper:
TaskVine: Managing In-Cluster Storage for High-Throughput Data Intensive Workflows
Sly-Delgado, Phung, Thomas, Simonetti, Hennessee, Tovar, Thain |
10:43am-10:53am |
Lightning Talk:
Leveraging Large Language Models to Build and Execute Computational Workflows
Duque, Syed, Day, Berry, Katz, Kindratenko |
10:53am-11:11am |
Paper:
Delivering Rules-Based Workflows for Science
Marchant, Blomqvist, Jensen, Lilholm, Nørgaard |
11:11am-11:29am |
Paper:
Julia as a Unifying End-to-End Workflow Language on the Frontier Exascale System
Godoy, Valero-Lara, Anderson, Lee, Gainaru, Ferreira da Silva, Vetter |
11:29am-11:39am |
Lightning Talk:
Scaling on Frontier: Uncertainty Quantification Workflow Applications using ExaWorks to Enable Full System Utilization
Titov, Carson, Rolchigo, Coleman, Belak, Bement, Laney, Turilli, Jha |
11:39am-11:57am |
Paper:
Distributed Data Locality-Aware Job Allocation
Markovic, Kolovos, Soares Indrusiak |
11:57am-12:15pm |
Paper:
Fluxion: A Scalable Graph-Based Resource Model for HPC Scheduling Challenges
Patki, Ahn, Milroy, Yeom, Garlick, Grondona, Herbein, Scogland |
12:15pm-12:25pm |
Lightning Talk:
The Common Workflow Scheduler Interface: Status Quo and Future Plans
Lehmann, Bader, Thamsen, Leser |
12:25pm-12:30pm | Wrap Up - Part II
Silvina Caino-Lores, Anirban Mandal |
University of Naples “Parthenope”, Italy
Scientific workflows processing enormous amounts of data using distributed HPC systems or on-demand computational resources are a solid and reliable paradigm in data science. The orchestration of environmental models to produce simulations or forecasts is a more widespread routine production workflow application. This presentation concerns our vision of workflows as building blocks for environmental applications, combining numerical and artificial intelligence models to produce augmented environmental forecasts and predictions. DagOnStar is the workflow engine developed at the HPSC SmartLab of the University of Naples "Parthenope" for orchestrating environmental models used by the Center for Monitoring and Modeling Marine and Atmosphere (CMMMA) applications to orchestrate the weather and marine forecast production. The Center runs a routinary workflow application to predict the contamination by Escherichia Coli (E. Coli) in farmed mussels, augmenting the forecasted pollutant transport and diffusion (WaComM++ model) with an artificial intelligence model (AIQUAM model) trained with microbiological measurements. The first assessment and evaluation of the system demonstrate that the workflow application can predict E. Coli presence with an accuracy of more than 90%.
Raffaele Montella is an Associate Professor with tenure in Computer Science at the Department of Science and Technologies (DiST), University of Naples “Parthenope" (UNP), Italy. He got his degree in (Marine) Environmental Science at the University of Naples “Parthenope" in 1998. He defended his Ph.D. thesis on "Environmental modeling and Grid Computing techniques" earning a Ph.D. in Marine Science and Engineering at the University of Naples "Federico II". He leads the High-Performance Scientific Computing (HPSC) Laboratory and the IT infrastructure of the UNP Center for Marine and Atmosphere Monitoring and Modeling (CMMMA). His main research topics and scientific production are focused on: tools for high-performance computing, cloud computing, and GPUs with applications in the field of computational environmental science (multi-dimensional geo-referenced big data, distributed computing for modeling, and scientific workflows and science gateways) leveraging on his previous (and still ongoing) experiences in embedded, mobile, wearable, pervasive computing, and Internet of Things.
San Diego Supercomputing Center, USA
Intellectual freedom, curiosity, and creativity are qualities of the academic landscape that appeal to many researchers. But a blank page in the wrong context can halt creativity, such as creating a data management and sharing plan. A goal for research support staff, as well as for researchers themselves, is to lower time spent on the mechanics of research to make more time for open-ended discovery. For data-driven science, this includes collecting and processing data so that one can find and combine research objects later. It also means preparing research objects, such as data, software, and workflows, for later reuse by others. The FAIR principles provide a conceptual framework for comprehensively ensuring research assets are accessible for reuse. Currently researchers apply the FAIR practices as best they can, based on community practices, lessons learned on the job, and other mentorship they may have received. This talk will explore how FAIR implementation practices – or any other practice that aids data management and sharing, can be provided to researchers and customized to their specific research tasks. Research workflows can be improved through new ways of sharing hard won knowledge, and through processes that allow for peer assessment. These data sources can be repurposed in existing tools or through new interfaces, such as the FAIR+ Implementation Survey Tool (FAIRIST).
Christine Kirkpatrick leads the San Diego Supercomputer Center’s (SDSC) Research Data Services division, which manages large-scale infrastructure, networking, and services for research projects of regional and national scope. Her duties also include a leadership role on the Schmidt Futures Foundation and NSF-funded Open Storage Network and as leader of the Data Core for the NIH-funded Metabolomics Workbench, a national data repository for metabolomics studies. Her research in computer science has centered on improving machine learning processing through research data management techniques. In addition to being PI of the EarthCube Office (ECO), Kirkpatrick founded the US GO FAIR Office, is PI of the West Big Data Innovation Hub, and Co-PI on an NSF Accelnet: Designing a Water, Data, and Systems Science Network of Networks to Catalyze Transboundary Groundwater Resiliency Research. She serves as the Secretary General of the International Science Council's Committee on Data (CODATA), co-Chairs the FAIR Digital Object Forum, is on the external Advisory Board for the European Open Science Cloud (EOSC) Nordic, and the National Academies of Sciences’ U.S. National Committee for the Committee on Data.
All deadlines are Anywhere on Earth (AoE).
Scientific workflows have been used almost universally across scientific domains and have underpinned some of the most significant discoveries of the past several decades. Workflow management systems (WMSs) provide abstraction and automation, which enable a broad range of researchers to easily define sophisticated computational processes and to then execute them efficiently on parallel and distributed computing systems. As workflows have been adopted by a number of scientific communities, they are becoming more complex and require more sophisticated workflow management capabilities. A workflow now can analyze terabyte-scale data sets, be composed of one million individual tasks, require coordination between heterogeneous tasks, manage tasks that execute for milliseconds to hours, and can process data streams, files, and data placed in object stores. The computations can be single core workloads, loosely coupled computations, or tightly all within a single workflow, and can run in dispersed computing platforms, from edge to core resources.
This workshop focuses on the many facets of scientific workflow management systems, ranging from actual execution to service management and the coordination and optimization of data, service, and job dependencies. The workshop covers a broad range of issues in the scientific workflow lifecycle that include: scientific workflows representation and enactment; workflow scheduling techniques to optimize the execution of the workflow on heterogeneous infrastructures; workflow enactment engines that need to deal with failures in the application and execution environment; and a number of computer science problems related to scientific workflows such as semantic technologies, compiler methods, scheduling and fault detection and tolerance.
WORKS23 will be held in conjunction with SuperComputing (SC23), Denver, Colorado, USA, at the Colorado Convention Center in Denver.
WORKS23 welcomes original submissions in a range of areas, including but not limited to:
There will be two forms of presentations:
Submission of a full paper may result in a talk. Submission of an abstract may result in a lightning talk. Each submission will receive at least three reviews from the workshop program committee.
Accepted papers from the workshop will be published in the SC Workshops Proceedings volume and made available online.
The format of the paper should follow ACM manuscript guidelines. Templates are available from
this link.
For Latex users, version 1.90 (last update April 4, 2023) is the latest template, and please use the “sigconf” option.
French Institute for Research in Computer Science and Automation (INRIA), France
Renaissance Computing Institute (RENCI), UNC Chapel Hill, USA
University of Queensland, Australia
University of Edinburgh, UK
University of Southern California, USA
University of Tennessee, USA
San Diego Supercomputing Center
Technical University of Vienna
University Carlos III of Madrid
University Carlos III of Madrid
University of Chicago
University of Southern California
University of Southern California
Technical University of Vienna
Nvidia
University of St. Andrews
Universidad Politécnica de Madrid
University of Illinois Chicago
Concordia University
Oak Ridge National Laboratory
Rutgers University
University of Illinois at Urbana-Champaign
University of Westminster
University of Tennessee
Oak Ridge National Laboratory
Newcastle University
University of Naples Parthenope
Argonne National Laboratory
University of Tennessee
Argonne National Laboratory
Lawrence Livermore National Laboratory
University of Klagenfurt
Cardiff University
University of Utah
French Institute for Research in Digital Science and Technology (Inria)
Barcelona Supercomputing Center
Oak Ridge National Laboratory
Oak Ridge National Laboratory
Oak Ridge National Laboratory
University of Calabria
French Institute for Research in Digital Science and Technology (Inria)
University of Notre Dame
Universidad de Zaragoza
Oak Ridge National Laboratory
Argonne National Laboratory
Argonne National Laboratory
For information please direct your inquiries to sc-ws-works@info.supercomputing.org, or contact the workshop chairs: