In cooperation with:

15th Workshop on Workflows in Support of Large-Scale Science

November 11, 2020 – Online event (10am-6:30pm ET)

Held in conjunction with SC20: The International Conference for High Performance Computing, Networking, Storage and Analysis

Co-chaired by:
Rafael Ferreira da Silva , University of Southern California, USA
Rosa Filgueira , University of Edinburgh, UK

Please, submit any inquiries to chairs@works-workshop.org

Workshop Evaluation

The WORKS 2020 Organizing Committee thank all participants of the workshop for their valuable contributions. We would very much appreciate if you could evaluate the workshop using the following link: http://eval.works-workshop.org.

Scientific workflows have been almost universally used across scientific domains and have underpinned some of the most significant discoveries of the past several decades. Workflow management systems (WMSs) provide abstraction and automation which enable a broad range of researchers to easily define sophisticated computational processes and to then execute them efficiently on parallel and distributed computing systems. As workflows have been adopted by a number of scientific communities, they are becoming more complex and require more sophisticated workflow management capabilities. A workflow now can analyze terabyte-scale data sets, be composed of one million individual tasks, require coordination between heterogeneous tasks, manage tasks that execute for milliseconds to hours, and can process data streams, files, and data placed in object stores. The computations can be single core workloads, loosely coupled computations, or tightly all within a single workflow, and can run in dispersed computing platforms.

This workshop focuses on the many facets of scientific workflow management systems, ranging from actual execution to service management and the coordination and optimization of data, service, and job dependencies. The workshop covers a broad range of issues in the scientific workflow lifecycle that include: scientific workflows representation and enactment; workflow scheduling techniques to optimize the execution of the workflow on heterogeneous infrastructures; workflow enactment engines that need to deal with failures in the application and execution environment; and a number of computer science problems related to scientific workflows such as semantic technologies, compiler methods, scheduling and fault detection and tolerance.

Important Dates

  • August 15 September 6, 2020 (final deadline extension) – Full paper deadline
  • September 15 September 30, 2020 – Paper acceptance notification
  • October 8, 2020 – Consent and release form
  • October 9, 2020 – Video presentations
  • October 18, 2020 – E-copyright registration completed by authors
  • October 18, 2020 – Camera-ready deadline
  • November 11, 2020 – Online Workshop (10am-6:30pm ET)
  • All deadlines are Anywhere on Earth (AoE)

Keynote

In Situ Data Analytics for Next Generation Molecular Dynamics Workflows

Dr. Michela Taufer, University of Tennessee Knoxville

This talk is about molecular dynamics (MD) simulations that study the classical time evolution of a molecular system at atomic resolution. These simulations are widely recognized in the fields of chemistry, material sciences, molecular biology and drug design; they are also one of the most common simulations on supercomputers. Today MD simulations are the most common simulations running on petascale machines. A survey of resources used on XSEDE machines over the past six months shows how biomolecular codes (predominantly MD codes such as Amber, CHARMM, and NAMD) use 25.7% of the XSEDE resources (i.e., total amount of XD service units (SUs) used by jobs in the field of science indicated). Work of Luu and co-authors show how HPC resources can be up to 75% idle, performing I/O operations while running scientific simulations because of poor data handling. Next-generation supercomputers will have dramatically higher performance than current systems, generating more data that needs to be analyzed (i.e., in terms of number and length of molecular dynamics trajectories). The coordination of data generation and analysis cannot rely on manual, centralized approaches as it does now.

The talk presents an interdisciplinary project integrating research from various areas across programs such as computer science, structural molecular biosciences, and high-performance computing to transform the centralized nature of the molecular dynamics analysis into a distributed approach that is predominantly performed in situ. Specifically, the effort presented in this talk combines machine learning and data analytics approaches, workflow management methods, and high performance computing techniques to analyze molecular dynamics data as it is generated, save to disk only what is really needed for future analysis, and annotate molecular dynamics trajectories to drive the next steps in increasingly complex simulations' workflows. The project's harnessed knowledge of molecular structures' transformations at runtime can be used to steer simulations to more promising areas of the simulation space, identify the data that should be written to congested parallel file systems, and index generated data for retrieval and post-simulation analysis.

Bio. Michela Taufer is an ACM Distinguished Scientist and holds the Jack Dongarra Professorship in High Performance Computing in the Department of Electrical Engineering and Computer Science at the University of Tennessee Knoxville (UTK). She earned her undergraduate degrees in Computer Engineering from the University of Padova (Italy) and her doctoral degree in Computer Science from the Swiss Federal Institute of Technology or ETH (Switzerland). From 2003 to 2004 she was a La Jolla Interfaces in Science Training Program (LJIS) Postdoctoral Fellow at the University of California San Diego (UCSD) and The Scripps Research Institute (TSRI), where she worked on interdisciplinary projects in computer systems and computational chemistry. Taufer has a long history of interdisciplinary work with scientists. Her research interests include software applications and their advanced programmability in heterogeneous computing (i.e., multi-core platforms and GPUs); cloud computing and volunteer computing; and performance analysis, modeling and optimization of multi-scale applications. She has been serving as the principal investigator of several NSF collaborative projects. She also has significant experience in mentoring a diverse population of students on interdisciplinary research. Taufer's training expertise includes efforts to spread high-performance computing participation in undergraduate education and research as well as efforts to increase the interest and participation of diverse populations in interdisciplinary studies.

Workshop Program

Time Event
10:00-10:10 AM EST
7:00-7:10 AM PST
3:00-3:10 PM GMT
Welcome
Rafael Ferreira da Silva, Rosa Filgueira
10:10-11:00 AM EST
7:10-8:00 AM PST
3:10-4:00 PM GMT
Keynote: In Situ Data Analytics for Next Generation Molecular Dynamics Workflows
Dr. Michela Taufer
11:00-11:30 AM EST
8:00-8:30 AM PST
4:00-4:30 PM GMT
Break
11:30-noon EST
8:30-9:00 AM PST
4:30-5:00 PM GMT
Runtime vs Scheduler: Analyzing Dask's Overheads
Stanislav Böhm, Jakub Beránek
noon-12:30 PM EST
9:00-9:30 AM PST
5:00-5:30 PM GMT
Workflow Generation with wfGenes
Mehdi Roozmeh, Ivan Kondov
12:30-1:00 PM EST
9:30-10:00 AM PST
5:30-6:00 PM GMT
Break
1:00-1:30 PM EST
10:00-10:30 AM PST
6:00-6:30 PM GMT
Supercomputing with MPI Meets the CommonWorkflow Language Standards: An Experience Report
Rupert W. Nash, Nick Brown, Michael R. Crusoe, Max Kontak
1:30-2:00 PM EST
10:30-11:00 AM PST
6:30-7:00 PM GMT
Applying workflows to scientific projects represented in file system directory tree
Mieszko Makuch, Maciej Malawski, Joanna Kocot, Tomasz Szepieniec
2:00-2:30 PM EST
11:00-11:30 AM PST
7:00-7:30 PM GMT
Break
2:30-3:00 PM EST
11:30-noon AM PST
7:30-8:00 PM GMT
Adaptive Optimizations for Stream-based Workflows
Liang Liang, Rosa Filgueira, Yan Yan
3:00-3:30 PM EST
noon-12:30 PM PST
8:00-8:30 PM GMT
Enabling Discoverable Trusted Services for Highly Dynamic Decentralized Workflows
Iain Barclay, Chris Simpkin, Graham Bent, Tom La Porta, Declan Millar, Alun Preece, Ian Taylor, Dinesh Verma
3:30-4:00 PM EST
12:30-1:00 PM PST
8:30-9:00 PM GMT
Break
4:00-4:30 PM EST
1:00-1:30 PM PST
9:00-9:30 PM GMT
WorkflowHub: Community Framework for Enabling Scientific Workflow Research and Development
Rafael Ferreira da Silva, Loic Pottier, Taina Coleman, Ewa Deelman, Henri Casanova
4:30-5:00 PM EST
1:30-2:00 PM PST
9:30-10:00 PM GMT
Characterizing Scientific Workflows on HPC Systems using Logs
Devarshi Ghoshal, Brian Austin, Deborah Bard, Christopher Daley, Glenn Lockwood, Nicholas J. Wright, Lavanya Ramakrishnan
5:00-5:30 PM EST
2:00-2:30 PM PST
10:00-10:30 PM GMT
Closing thoughts and next steps; moving forward together
Rafael Ferreira da Silva, Rosa Filgueira

Paper Submission

WORKS20 welcomes original submissions in a range of areas, including but not limited to:

  • Big Data analytics workflows
  • Data-driven workflow processing (including stream-based workflows)
  • Workflow composition, tools, and languages
  • Workflow execution in distributed environments (including HPC, clouds, and grids)
  • Reproducible computational research using workflows
  • Dynamic data dependent workflow systems solutions
  • Exascale computing with workflows
  • In Situ Data Analytics Workflows
  • Interactive workflows (including workflow steering)
  • Workflow fault-tolerance and recovery techniques
  • Workflow user environments, including portals
  • Workflow applications and their requirements
  • Workflow optimizations (including scheduling and energy efficiency)
  • Performance analysis of workflows
  • Workflow debugging
  • Workflow provenance
  • Machine Learning workflows

Papers should present original research and should provide sufficient background material to make them accessible to the broader community.

Instructions for submission:
Submissions are limited to 8 pages in the IEEE format (see https://www.ieee.org/conferences/publishing/templates.html). The 8-page limit includes figures, tables, appendices, and references.
WORKS papers this year will be published in cooperation with TCHPC and that they will be available from IEEE digital repository.

Organization

Program Committee Chairs

Rafael Ferreira da Silva

University of Southern California, USA

Rosa Filgueira

University of Edinburgh, UK

General Chair

Ian Taylor

Cardiff University, UK
University of Notre Dame, USA

Steering Committee

David Abramson

University of Queensland, Australia

Malcolm Atkinson

University of Edinburgh, UK

Ewa Deelman

University of Southern California, USA

Michela Taufer

University of Tennessee, USA

Program Committee

Pinar Alper
King's College London
Ilkay Altintas
SDSC
Khalid Belhajjame
Universit. Paris-Dauphine
Ivona Brandic
TU Wien
Silvina Caino-Lores
University of Tennessee
Henri Casanova
University of Hawaii at Manoa
Thomas Fahringer
University of Innsbruck
Rafael Ferreira da Silva
University of Southern California
Daniel Garijo
University of Southern California
Sandra Gesing
University of Notre Dame
Tristan Glatard
Concordia University
Daniel Katz
UIUC
Maciej Malawski
AGH UST
Anirban Mandal
RENCI
Marta Mattoso
UFRJ
Loic Pottier
University of Southern California
Radu Prodan
University of Klagenfurt
Ivan Rodero
Rutgers University
Rizos Sakellariou
University of Manchester
Renan Souza
IBM Research
Frédéric Suter
CNRS, INRIA
Domenico Talia
University of Calabria
Douglas Thain
University of Notre Dame
Chase Wu
NJ Institute of Technology