15th Workshop on Workflows in Support of Large-Scale Science
November 11, 2020 – Online event (10am-6:30pm ET)
Held in conjunction with SC20: The International Conference for High Performance Computing, Networking, Storage and Analysis
Co-chaired by:
Rafael Ferreira da Silva
, University of Southern California, USA
Rosa Filgueira
, University of Edinburgh, UK
Please, submit any inquiries to chairs@works-workshop.org
Workshop Evaluation
The WORKS 2020 Organizing Committee thank all participants of the workshop for their valuable contributions. We would very much appreciate if you could evaluate the workshop using the following link: http://eval.works-workshop.org.
Scientific workflows have been almost universally used across scientific domains and have underpinned some of
the most significant discoveries of the past several decades. Workflow management systems (WMSs) provide
abstraction and automation which enable a broad range of researchers to easily define sophisticated computational
processes and to then execute them efficiently on parallel and distributed computing systems. As workflows have
been adopted by a number of scientific communities, they are becoming more complex and require more
sophisticated workflow management capabilities. A workflow now can analyze terabyte-scale data sets, be
composed of one million individual tasks, require coordination between heterogeneous tasks, manage tasks that
execute for milliseconds to hours, and can process data streams, files, and data placed in object stores. The
computations can be single core workloads, loosely coupled computations, or tightly all within a single workflow,
and can run in dispersed computing platforms.
This workshop focuses on the many facets of scientific workflow
management systems, ranging from actual execution to service management and the coordination and
optimization of data, service, and job dependencies. The workshop covers a broad range of issues in the scientific
workflow lifecycle that include: scientific workflows representation and enactment; workflow scheduling
techniques to optimize the execution of the workflow on heterogeneous infrastructures; workflow enactment
engines that need to deal with failures in the application and execution environment; and a number of computer
science problems related to scientific workflows such as semantic technologies, compiler methods, scheduling
and fault detection and tolerance.
Important Dates
August 15 September 6, 2020(final deadline extension) – Full paper deadlineSeptember 15 September 30, 2020– Paper acceptance notificationOctober 8, 2020– Consent and release formOctober 9, 2020– Video presentationsOctober 18, 2020– E-copyright registration completed by authorsOctober 18, 2020– Camera-ready deadline- November 11, 2020 – Online Workshop (10am-6:30pm ET)
- All deadlines are Anywhere on Earth (AoE)
Keynote
In Situ Data Analytics for Next Generation Molecular Dynamics Workflows
Dr. Michela Taufer, University of Tennessee Knoxville
This talk is about molecular dynamics (MD) simulations that study the classical time evolution of a molecular system at atomic resolution. These simulations are widely recognized in the fields of chemistry, material sciences, molecular biology and drug design; they are also one of the most common simulations on supercomputers. Today MD simulations are the most common simulations running on petascale machines. A survey of resources used on XSEDE machines over the past six months shows how biomolecular codes (predominantly MD codes such as Amber, CHARMM, and NAMD) use 25.7% of the XSEDE resources (i.e., total amount of XD service units (SUs) used by jobs in the field of science indicated). Work of Luu and co-authors show how HPC resources can be up to 75% idle, performing I/O operations while running scientific simulations because of poor data handling. Next-generation supercomputers will have dramatically higher performance than current systems, generating more data that needs to be analyzed (i.e., in terms of number and length of molecular dynamics trajectories). The coordination of data generation and analysis cannot rely on manual, centralized approaches as it does now.
The talk presents an interdisciplinary project integrating research from various areas across programs such as computer science, structural molecular biosciences, and high-performance computing to transform the centralized nature of the molecular dynamics analysis into a distributed approach that is predominantly performed in situ. Specifically, the effort presented in this talk combines machine learning and data analytics approaches, workflow management methods, and high performance computing techniques to analyze molecular dynamics data as it is generated, save to disk only what is really needed for future analysis, and annotate molecular dynamics trajectories to drive the next steps in increasingly complex simulations' workflows. The project's harnessed knowledge of molecular structures' transformations at runtime can be used to steer simulations to more promising areas of the simulation space, identify the data that should be written to congested parallel file systems, and index generated data for retrieval and post-simulation analysis.
Bio. Michela Taufer is an ACM Distinguished Scientist and holds the Jack Dongarra Professorship in High Performance Computing in the Department of Electrical Engineering and Computer Science at the University of Tennessee Knoxville (UTK). She earned her undergraduate degrees in Computer Engineering from the University of Padova (Italy) and her doctoral degree in Computer Science from the Swiss Federal Institute of Technology or ETH (Switzerland). From 2003 to 2004 she was a La Jolla Interfaces in Science Training Program (LJIS) Postdoctoral Fellow at the University of California San Diego (UCSD) and The Scripps Research Institute (TSRI), where she worked on interdisciplinary projects in computer systems and computational chemistry. Taufer has a long history of interdisciplinary work with scientists. Her research interests include software applications and their advanced programmability in heterogeneous computing (i.e., multi-core platforms and GPUs); cloud computing and volunteer computing; and performance analysis, modeling and optimization of multi-scale applications. She has been serving as the principal investigator of several NSF collaborative projects. She also has significant experience in mentoring a diverse population of students on interdisciplinary research. Taufer's training expertise includes efforts to spread high-performance computing participation in undergraduate education and research as well as efforts to increase the interest and participation of diverse populations in interdisciplinary studies.
Workshop Program
Time | Event |
---|---|
10:00-10:10 AM EST 7:00-7:10 AM PST 3:00-3:10 PM GMT |
Welcome Rafael Ferreira da Silva, Rosa Filgueira |
10:10-11:00 AM EST 7:10-8:00 AM PST 3:10-4:00 PM GMT |
Keynote: In Situ Data Analytics for Next Generation Molecular
Dynamics Workflows Dr. Michela Taufer |
11:00-11:30 AM EST 8:00-8:30 AM PST 4:00-4:30 PM GMT |
Break |
11:30-noon EST 8:30-9:00 AM PST 4:30-5:00 PM GMT |
Runtime vs Scheduler: Analyzing Dask's Overheads Stanislav Böhm, Jakub Beránek |
noon-12:30 PM EST 9:00-9:30 AM PST 5:00-5:30 PM GMT |
Workflow Generation with wfGenes Mehdi Roozmeh, Ivan Kondov |
12:30-1:00 PM EST 9:30-10:00 AM PST 5:30-6:00 PM GMT |
Break |
1:00-1:30 PM EST 10:00-10:30 AM PST 6:00-6:30 PM GMT |
Supercomputing with MPI Meets the CommonWorkflow Language Standards: An
Experience Report Rupert W. Nash, Nick Brown, Michael R. Crusoe, Max Kontak |
1:30-2:00 PM EST 10:30-11:00 AM PST 6:30-7:00 PM GMT |
Applying workflows to scientific projects represented in file system
directory tree Mieszko Makuch, Maciej Malawski, Joanna Kocot, Tomasz Szepieniec |
2:00-2:30 PM EST 11:00-11:30 AM PST 7:00-7:30 PM GMT |
Break |
2:30-3:00 PM EST 11:30-noon AM PST 7:30-8:00 PM GMT |
Adaptive Optimizations for Stream-based Workflows Liang Liang, Rosa Filgueira, Yan Yan |
3:00-3:30 PM EST noon-12:30 PM PST 8:00-8:30 PM GMT |
Enabling Discoverable Trusted Services for Highly Dynamic Decentralized
Workflows Iain Barclay, Chris Simpkin, Graham Bent, Tom La Porta, Declan Millar, Alun Preece, Ian Taylor, Dinesh Verma |
3:30-4:00 PM EST 12:30-1:00 PM PST 8:30-9:00 PM GMT |
Break |
4:00-4:30 PM EST 1:00-1:30 PM PST 9:00-9:30 PM GMT |
WorkflowHub: Community Framework for Enabling Scientific Workflow Research
and Development Rafael Ferreira da Silva, Loic Pottier, Taina Coleman, Ewa Deelman, Henri Casanova |
4:30-5:00 PM EST 1:30-2:00 PM PST 9:30-10:00 PM GMT |
Characterizing Scientific Workflows on HPC Systems using Logs Devarshi Ghoshal, Brian Austin, Deborah Bard, Christopher Daley, Glenn Lockwood, Nicholas J. Wright, Lavanya Ramakrishnan |
5:00-5:30 PM EST 2:00-2:30 PM PST 10:00-10:30 PM GMT |
Closing thoughts and next steps; moving forward together Rafael Ferreira da Silva, Rosa Filgueira |
Paper Submission
WORKS20 welcomes original submissions in a range of areas, including but not limited to:
- Big Data analytics workflows
- Data-driven workflow processing (including stream-based workflows)
- Workflow composition, tools, and languages
- Workflow execution in distributed environments (including HPC, clouds, and grids)
- Reproducible computational research using workflows
- Dynamic data dependent workflow systems solutions
- Exascale computing with workflows
- In Situ Data Analytics Workflows
- Interactive workflows (including workflow steering)
- Workflow fault-tolerance and recovery techniques
- Workflow user environments, including portals
- Workflow applications and their requirements
- Workflow optimizations (including scheduling and energy efficiency)
- Performance analysis of workflows
- Workflow debugging
- Workflow provenance
- Machine Learning workflows
Papers should present original research and should provide sufficient background material to make them accessible to the broader community.
Instructions for submission:
Submissions are limited to 8 pages in the IEEE format (see
https://www.ieee.org/conferences/publishing/templates.html). The 8-page limit includes figures, tables,
appendices, and references.
WORKS papers this year will be published in cooperation with
TCHPC and that they will be available from IEEE
digital repository.
Organization
Program Committee Chairs
Steering Committee
Program Committee
King's College London
SDSC
Universit. Paris-Dauphine
TU Wien
University of Tennessee
University of Hawaii at Manoa
University of Innsbruck
University of Southern California
University of Southern California
University of Notre Dame
Concordia University
UIUC
AGH UST
RENCI
UFRJ
University of Southern California
University of Klagenfurt
Rutgers University
University of Manchester
IBM Research
CNRS, INRIA
University of Calabria
University of Notre Dame
NJ Institute of Technology