CEBDA 2018

The International workshop on the Convergence of Extreme Scale Computing and Big Data Analysis

Held in conjunction with IEEE IPDPS Vancouver, British Columbia CANADA, May 21-25, 2018

Workshop Program

We are happy to announce the final program of the workshop. The workshop will be held on Friday May 25th, 2018

Session 1 (9:00 - 10:00)


Keynote Speaker: Franck Cappello (Argonne National Laboratory)

Title: Lossy compression of scientific simulation data: from visualization to checkpoint/restart

Abstract: Extreme-scale scientific simulations are already generating more data that can be communicated stored and analyzed. The data flood will get even worse with future exascale systems. This is true for output data and also for checkpoint/restart data. Scientific data reduction is a necessity to drastically accelerate I/O, reduce data footprint on storage and also to speed-up significantly computation, as demonstrated by the 2017 Gordon Bell award winner. But reduction should be performed wisely, for execution correctness (checkpoint/restart) and to keep the information that matters for the scientists. We can try to develop application-specific lossy data reduction techniques or to compress the dataset with advanced generic lossless compression algorithms. Unfortunately, these two approaches are either unpractical for most applications or do not provide enough data reduction for scientific datasets. Other domains already familiar with Big Data massively employ lossy compression to reduce the data size. However, lossy compression has very rarely been applied to scientific simulation data and as a result it is not well understood. In this talk, we will present challenges and opportunities in terms of compression algorithms and application of lossy compression to scientific data. We will detail not only the best-in-class compression algorithms but also the tools to assess comprehensively the error introduced by lossy compression. We will give examples of lossy compression of scientific datasets with application to visualization and checkpoint/restart. Lossy compression of scientific data reveal itself as a fascinating young research domain with many opportunities to explore and discover new techniques.

Short Bio:
Franck is senior computer scientist at Argonne National Laboratory and adjunct associate professor in the department of computer science at University of Illinois at Urbana Champaign. He is the director of the Joint-Laboratory on Extreme Scale Computing gathering six of the leading high-performance computing institutions in the world: Argonne National Laboratory (ANL), National Center for Scientific Applications (NCSA), Inria, Barcelona Supercomputing Center (BSC), Julich Supercomputing center (JSC) and Riken AICS. Franck is an expert in parallel/distributed computing and high-performance computing. Recently he started investigating lossy compression for scientific datasets to respond to the pressing needs of scientists performing large scale simulations and experiments for significant data reduction. Franck is member of the editorial board of IEEE Transactions on Parallel and Distributed Computing and of the IEEE CCGRID steering committees. He is fellow of the IEEE.

Cofee Break (10:00 - 10:30)

Session 2 (10:30 - 12:00)


Data-Locality Aware Dynamic Schedulers for Independent Tasks with Replicated Inputs

Olivier Beaumont, Thomas Lambert, Loris Marchal, Bastien Thomas


Transferring Data from High-Performance Simulations to Extreme Scale Analysis Applications in Real-Time

Thomas Marrinan, Silvio Rizzi, Joseph Insley, Brian Toonen, William Allcock, Michael Papka


Towards a TRansparent I/O solution

Fotios Nikolaidis, Nick Kossifidis, Thomas Leibovici, Soraya Zertal

Workshop Description

The deployment of extreme scale computing platforms in research and industry coupled with the proliferation of large and distributed digital data sources have the potential for unprecedented insights and understanding in all areas of science, engineering, business, and society in general. However, challenges related to the Big Data generated and processed by these systems remain a significant barrier in achieving this potential.

Addressing these challenges requires a seamless integration of the extreme scale/high performance computing, cloud computing, storage technologies, data management, energy efficiency, and big data analytics research approaches, framework/technologies, and communities. The convergence and integration of exascale systems and data analysis is crucial to the future. To achieve this goal, both communities need to collectively explore and embrace emerging disruptions in architecture and hardware technologies as well as new data- driven application areas such as those enabled by the Internet of Things. Finally, educational and workforce development structures have to evolved to develop the required integrated skillsets.

The goal of this workshop is to bring leading researchers from these communities together to jointly explore such integration, and to develop a research agenda towards brings the diverging research groups and technologies stack toward a more convergent path. The workshop provides a forum for scientists and engineers in academia and industry to present their latest research findings on major and emerging topics in this field.

Workshop Topics

A partial list of topics of interest is as follows:

  • Models and techniques for scalable data analysis
  • Extreme data discovery solutions
  • Extreme scale platforms for Big Data analytics
  • Exascale data analysis programming abstractions and services
  • Parallel and distributed Big Data analysis algorithms
  • Code coordination and data integration on HPC platforms
  • Adaption of data mining/analysis algorithms to extreme scale systems
  • Data-centric highly scalable programming tools and algorithms
  • Convergence of High-performance and Big Data analytics frameworks, programming models, and tools
  • Scalable storage architectures for extreme scale systems
  • Techniques for data integrity and availability for extreme scale systems
  • New storage devices for Big Data management in exascale systems
  • Security issues in Big Data analysis and management in extreme scale systems
  • Energy-efficiency issues in Big Data analysis and management in exascale systems
  • Stream data processing in exascale systems
  • Case studies of data-intensive applications in exascale systems
  • Scheduling and provisioning data analytics on hybrid Cloud and Exascale infrastructure
  • Accuracy and correctness of Big Data analysis on exascale systems
  • In-situ techniques for extreme scale data analytics
  • Machine learning techniques for exascale scale applications
  • \

Submission Guidelines

All papers need to be submitted electronically through EasyChair-CEBDA2018 in PDF format.

Submitted papers must not substantially overlap with papers that have been published or that are simultaneously submitted to a journal or a conference with proceedings. Papers must be clearly presented in English, and may not exceed 8 letter-size (8.5 x 11) pages including all figures, tables and references using the IEEE format for conference proceedings.

Templates are available at:

  • LaTex Package ZIP
  • Word Template DOC and PDF

Submitted papers must represent original and unpublished work, that is not currently under review. All manuscripts will be evaluated according to their significance, originality, technical content, style, clarity, quality of presentation, and relevance to the workshop. At least one author of each accepted paper is expected to attend the workshop.


All papers accepted and presented at the CEBDA 2018 workshop will be published together with the proceedings of other IPDPS 2018 workshops, and will be submitted to IEEE Xplore for publication and EI indexing.

Journal Special Issue

Authors of the best technical papers will be invited to submit an extended version of their work to a special issue on the Convergence of Extreme Scale Computing and Big Data Analysis in the Future Generation Computer Systems Journal (IF: 3.997).

Organizing Committee

Workshop Co-Chairs:

Shadi Ibrahim, Inria, France
Manish Parashar, Rutgers University, USA
Anna Queralt, Barcelona Supercomputing Center, Spain
Domenico Talia, University of Calabria, Italy

Program Committee

Jean-Thomas Acquaviva, Data Direct Networks, France
Guillaume Aupy, Inria, France
Olivier Beaumont, Inria, France
Timo Bremer, Lawrence Livermore National Laboratory, USA
Andre Brinkmann, Johannes Gutenberg-Universitšt Mainz, Germany
Alexandru Costan, Inria, France
Frederic Desprez, Inria, France
Simon Dobson, University of St Andrews, UK
Matthieu Dorier, Argonne National Laboratory, USA
Bingsheng He, National University of Singapore, Singapore
Michael Kuhn, University of Hamburg, Germany
Laurent Lefevre, Inria, France
Fabrizio Marozzo, University of Calabria, Italy
Dhabaleswar Panda, Ohio state university, USA
Abani Patra, University of Buffalo, USA
Dana Pectu, West University of Timisoara, Romania
Depei Qian, Beihang University, China
Pradeep Subedi, Rutgers University, USA
Paolo Trunfio, University of Calabria, Italy
Vladimir Vlassov, KTH Royal Institute of Technology, Sweden
Logan Ward, Argonne National Laboratory, USA
Amelie Chi Zhou, ShenZhen University, China

© CEBDA 2017