EBDMA 2017


1st Workshop on the Integration of Extreme Scale Computing and Big Data Management and Analytics

Held in conjunction with IEEE/ACM CCGrid 2017 Madrid, Spain, May 14-17, 2017


Workshop Program


We are happy to announce the final program of the workshop. The workshop will be held on Sunday May 14th, 2017 at Melia Los Galgos Hotel.

Session 1 (14:00 - 16:00)

Location: Velazquez

14:00-15:00

Keynote Speaker: Rosa M. Badia (Barcelona Supercomputing Center)

Title: Task-based programming model as an alternative for Big Data and Analytics

Abstract: Task-based programming models provide a friendly interface that enables the parallelization of sequential applications. What is more, the expression of the application as a task dependency graph, enables an out of order execution of the tasks and the exploitation of distant parallelism. This category of programming models is now very well accepted by HPC application developers.

Furthermore, we believe that task-based programming models are a very good approach for programming Big Data applications. In this talk, we will present the PyCOMPSs/COMPSs programming model and its main programming and runtime features. The talk will also include how PyCOMPSs/COMPSs is being integrated with new storage solutions with persistency. Finally, we will compare its programmability and performance to those of Apache Spark by means of a set of Big Data benchmarks.

Short Bio:
Rosa M. Badia holds a PhD on Computer Science (1994) from the Technical University of Catalonia (UPC). She is the manager of the Workflows and Distributed Computing research group at the Barcelona Supercomputing Center (BSC). She is also a Scientific Researcher from the Consejo Superior de Investigaciones Cientificas (CSIC). She was involved in teaching and research activities at the UPC from 1989 to 2008, where she was an Associated Professor since year 1997. From 1999 to 2005 she was involved in research and development activities at the European Center of Parallelism of Barcelona (CEPBA). Her current research interest are programming models for complex platforms (from multicore, GPUs to Grid/Cloud). The group lead by Dr. Badia has been developing StarSs programming model for more than 10 years, with a high success in adoption by application developers. Currently the group focuses its efforts in PyCOMPSs/COMPSs, an instance of the programming model for distributed computing including Cloud, and its application to parallelize Big Data and Analytics.

Dr Badia has published near 200 papers in international conferences and journals in the topics of her research. She has been very active in projects funded by the European Commission and in contracts with industry (IBM and Intel). She is currently participating in the following European funded projects: Euroserver, The Human Brain Project, the BioExcel CoE, NEXTGenIO, MUG, EUBra BIGSEA, TANGO, mf2C, the EXPERTISE ITN and she is a member of HiPEAC2 NoE.

15:00-15:30

A Data-driven Approach based on Auto-Regressive Models for Energy-Efficient Clouds

Albino Altomare, Eugenio Cesario

15:30-16:00

An Empirical Evaluation of How The Network Impacts The Performance and Energy Efficiency in RAMCloud

Yacine Taleb, Shadi Ibrahim, Gabriel Antoniu, Toni Cortes

Cofee Break (16:00 - 16:30)

Session 2 (16:30 - 18:00)

Location: Velazquez

16:30-17:00

Evaluation of HPC-Big Data Applications Using Cloud Platforms

Shweta Salaria, Kevin Brown, Hideyuki Jitsumoto, Satoshi Matsuoka

17:00-17:30

Exploring Shared State in Key-Value Store for Window-Based Multi-Pattern Streaming Analytics

Ovidiu Cristian Marcu, Radu Tudoran, Bogdan Nicolae, Alexandru Costan, Gabriel Antoniu, Maria Perez

17:30-18:00

On the Use of In-Memory Analytics Workflows to Compute eScience Indicators from Large Climate Datasets

Alessandro D'Anca, Cosimo Palazzo, Donatello Elia, Sandro Fiore, Ioannis Bistinas, Kristin Böttcher, Victoria Bennett, Giovanni Aloisio

Workshop Description


The deployment of extreme scale computing platforms in research and industry coupled with the proliferation of large and distributed digital data sources have the potential for unprecedented insights and understanding in all areas of science, engineering, business, and society in general. However challenges related to the Big Data generated and processed by these systems remain a significant barrier in achieving this potential.

Addressing these challenges requires a seamless integration of the extreme scale/high performance computing, cloud computing, storage technologies, data management, energy efficiency, and big data analytics research approaches, framework/technologies, and communities. The convergence and integration of HPC, cloud computing and data analysis is crucial to the future. To achieve this goal, both communities need to collectively explore and embrace emerging disruptions in architecture and hardware technologies as well as new data-driven application areas such as those enabled by the Internet of Things. Finally, educational and workforce development structures have to evolved to develop the required integrated skillsets.

The goal of this workshop is to bring leading researchers from these communities together to jointly explore such integration, and to develop a research agenda towards brings the diverging research groups and technologies stack toward a more convergent path. The workshop provides a forum for scientists and engineers in academia and industry to present their latest research findings on major and emerging topics in this field.

Workshop Topics


A partial list of topics of interest is as follows:

  • Models and techniques for scalable data analysis
  • Extreme data discovery solutions
  • HPC and extreme scale platforms for Big Data analytics
  • Exascale data analysis programming abstractions and services
  • Parallel and distributed Big Data analysis algorithms
  • Data analysis as a service infrastructure
  • Code coordination and data integration on HPC platforms
  • Interoperability of Big Data analytics frameworks
  • Adaption of data mining algorithms on extreme scale systems
  • Data-centric scalable programming tools and algorithms
  • High-performance and Big Data analytics frameworks, programming models, and tools
  • Leveraging processing, storage and communications technologies (multi/many-core architectures, accelerators, RDMA-enabled networking, NVRAMs and SSDs) in integrated HPC Big Data applications
  • Performance modeling and evaluation of integrated HPC Big Data applications
  • Fault tolerance, reliability and availability for high-performance Big Data computing
  • New storage devices for Big Data management in HPC and Clouds
  • Security issues in Big Data analysis and management in HPC and Clouds
  • Energy-efficiency issues in Big Data analysis and management in HPC and Clouds
  • Stream data processing in HPC and Clouds
  • Case studies of data-intensive applications in HPC and Clouds
  • Scheduling and provisioning data analytics on hybrid Cloud and HPC infrastructure

Submission Guidelines


All papers need to be submitted electronically through the CCGrid 2017 conference website (https://www.easychair.org/conferences/?conf=ccgrid2017) with PDF format. Note that you have to select the track EBDMA 2017 at the beginning of the submission procedure. Submitted papers must not substantially overlap with papers that have been published or that are simultaneously submitted to a journal or a conference with proceedings. Papers must be clearly presented in English, and may not exceed 8 letter-size (8.5 x 11) pages including all figures, tables and references using the IEEE format for conference proceedings.

Templates are available at:

  • LaTex Package ZIP
  • Word Template DOC and PDF

Submitted papers must represent original and unpublished work, that is not currently under review. All manuscripts will be evaluated according to their significance, originality, technical content, style, clarity, quality of presentation, and relevance to the workshop. At least one author of each accepted paper is expected to attend the workshop.

Publication


All papers accepted and presented at the EBDMA 2017 workshop will be published in the IEEE/ACM CCGrid conference proceedings, and will be submitted to IEEE Xplore for publication and EI indexing.

Journal Special Issue


Authors of the best technical papers will be invited to submit an extended version of their work to the special issue on the Integration of Extreme Scale Computing and Big Data Management and Analytics of IEEE Transactions on Big Data (https://www.computer.org/web/tbd).

Organizing Committee


Workshop Co-Chairs:

Shadi Ibrahim, Inria, France
Manish Parashar, Rutgers University, USA
Anna Queralt, Barcelona Supercomputing Center, Spain
Domenico Talia, University of Calabria, Italy

Program Committee


Ilkay Altintas, SDSC/UCSD, USA
Andre Brinkmann, Johannes Gutenberg-Universität Mainz, Germany
Gene Cooperman, Northeastern University, USA
Alexandru Costan, Inria Rennes, France
Frederic Desprez, Inria, France
Simon Dobson, University of St Andrews, UK
Jack Dongarra, University of Tennessee, USA
Matthieu Dorier, Argonne National Laboratory, USA
Bingsheng He, National University of Singapore, Singapore
Hai Jin, Huazhong University of Science and Technology, China
Scott Klasky, Oak Ridge National Laboratory, USA
Dieter Kranzlmueller, Ludwig-Maximilians-Universitaet Muenchen, Germany
Michael Kuhn, University of Hamburg, Germany
Adrien Lebre, Inria Ecole des Mines, France
Laurent Lefevre, Inria, France
Manolis Marazakis, Instutute of Computer Science, FORTH, Greece
Ramón Nou, Barcelona Supercomputing Center, Spain
Anne-Cécile Orgerie, Centre National de la Recherche Scientifique, France
Dana Pectu, West University of Timisoara, Romania
Maria Perez, Universidad Politecnica de Madrid, Spain
Depei Qian, Beihang University, China
Rob Ross, Argonne National Laboratory, USA
Paolo Trunfio, University of Calabria, Italy
Vladimir Vlassov, KTH Royal Institute of Technology, Sweden
Amelie Chi Zhou, Inria Rennes, France


© EBDMA 2016