Some members of BigStorage have participated in the Dagstuhl Seminar "Challenges and Opportunities of User-Level File Systems for HPC", from May 14th to May 19th, 2017. In particular, Andre Brinkmann was one of the organizers of the Seminar, which gathered many experts in the area of I/O and HPC.

More information about the Seminar: http://www.dagstuhl.de/no_cache/en/program/calendar/semhp/?semnr=17202

 

URL of the LSDMA Technical Forumhttps://indico.desy.de/conferenceDisplay.py?confId=15810

URL of the BigStorage contribution: https://indico.desy.de/contributionDisplay.py?contribId=7&confId=15810

LSDMA Technical Forum is a platform for novel and running projects to present their technical challenges, goals as well as currently open challenges. This creates an environment where the technical people can exchange expertise about state of the art solutions, discuss common challenges and possibly identify future joint projects or proposals. Topics are centered to the fields of storage, big data, identity management and performance.

 

Title: Big Data and Extreme Computing: a Storage-Based Pathway to Convergence

Abstract: Recently, the convergence of Extreme Computing and Big Data has become a hot subject for debates between the two communities. The BDEC workshop series was initiated to explore the motivations and the means to achieve such a convergence. This talk presents a storage-centered perspective for considering such a convergence, relying on the potential use of built-in transactions at storage level. It explains the rationale underlying such a vision, then it introduces Týr, a storage system that illustrates this approach.

Slides: http://www.exascale.org/bdec/sites/www.exascale.org.bdec/files/6-Antoniu-BDEC17Jun16.pdf

URL of the event: http://www.exascale.org/bdec/meeting/frankfurt

URL of JLESC: https://jlesc.github.io

URL of the 5th JLESC workshop: https://jlesc.github.io/events/5th-jlesc-workshop/

Talk by Pierre Matri: Ty ́r: Blob Storage Systems Meet Built-In Transactions

Abstract: Concurrent Big Data applications often require high-performance storage, as well as ACID (Atomicity, Consistency, Isolation, Durability) transaction sup- port. Blobs (binary large objects) are an increasingly popular low-level model for addressing the storage needs of such applications, providing a solid base for developing higher-level storage solutions, such as object stores or distributed file systems. However, today’s blob storage systems typically offer no transaction semantics. This demands users to coordinate access to data carefully in or- der to avoid race conditions, inconsistent writes, overwrites and other problems that cause erratic behavior. We argue there is a gap between existing storage solutions and application requirements, which limits the design of transaction- oriented applications. In this talk, we briefly introduce Ty ́r, the first blob stor- age system to provide built-in, multiblob transactions, while retaining sequential consistency and high throughput under heavy access concurrency.

Talk by Gabriel Antoniu: Spark versus Flink: Understanding Performance in Big Data Analytics Frameworks

Abstract: Big Data analytics has recently gained increasing popularity as a tool to process large amounts of data on-demand. Spark and Flink are two Apache-based data analytics frameworks that facilitate the development of multi-step data pipelines using directly acyclic graph patterns. Making the most out of these frameworks is challenging because efficient executions strongly rely on complex parameter configurations and on an in-depth understanding of the underlying architectural choices. Although extensive research has been devoted to improving and evalu- ating the performance of such analytics frameworks, most of them benchmark them against Hadoop, as a baseline, a rather unfair comparison considering the fundamentally different design principles. This work aims to bring some justice in this respect, by directly comparing the performance of Spark and Flink. Our goal is to identify and explain the impact of the different architectural choices and the parameter configurations on the perceived end-to-end performance. To this end, we develop a methodology for correlating the parameter settings and the operators execution plan with the resource usage. We use this methodology to dissect the performance of Spark and Flink with several representative batch and iterative workloads on up to 100 nodes. We highlight how performance correlates to operators, to resource usage and to the specifics of the internal framework design.

Andre Brinkmann (University of Mainz) talks about Energy and Big Data in the Insider Talk "New Generation Datacenter" (Video in German)

http://www.datacenter-insider.de/insider-talk-episode-4-low-energy-big-data-die-naechste-v-35066-12652/