logo bs1

BigStorage is an European Training Network (ETN) whose main goal is to train future data scientists in order to enable them and us to apply holistic and interdisciplinary approaches for taking advantage of a data-overwhelmed world, which requires HPC and Cloud infrastructures with a redefinition of storage architectures underpinning them – focusing on meeting highly ambitious performance and energy usage objectives.

Nowadays there is a lack of professionals who know how to deal with storage, management and analysis of Big Data. Indeed, there is a gap between infrastructures for dealing with Big Data and applications using these volumes of data. In 2011, the McKinsey Global Institute published a study that found that, by 2018, there could be a shortage of up to 190,000 data scientists in the United States, representing a 50 percent to 60 percent gap between supply and demand. Similarly, European officials estimate that 300,000 data scientists will be needed in Europe in the forthcoming years. Other reports, such as those from PRACE and ETP4HPC, have also emphasized the need of skills in HPC, Cloud, Storage, Energy, or Big Data to maintain Europe’s economy. In this context, a major goal of this project is to bring a substantial contribution to the training process of these future experts.

To gain value from Big Data it must be addressed from many different angles:

  1. applications, which can exploit this data
  2. middleware, operating in the cloud and HPC environments
  3. infrastructure, which provides the Storage, and Computing capable of handling it

Big Data can only be effectively exploited if techniques and algorithms are available, which help to understand its content, so that it can be processed by decision-making models. This is the main goal of Data Science, a new discipline related to Big Data that incorporates theories and tools from many areas, including statistics, machine learning, visualization, databases, or highly parallelised HPC programming.

We claim that this ETN project will be the ideal means to educate Early Stage Researchers on the different facets of Data Science(across storage hardware and software architectures, large-scale distributed systems, data management services, data analysis, machine learning, decision making). Such a multifaceted expertise is mandatory to enable researchers to propose appropriate answers to applications requirements, while leveraging advanced data storage solutions unifying cloud and HPC storage facilities.

We will focus on studying four representative Big Data application use cases, which will set up the foundation for the project and the Early Stage Researchers work:

  • The Brain Project (HBP), which is an EC FET Flagship project to understand the human brain and its diseases by emulating its computational capabilities.
  • The Square Kilometre Array (SKA), which aims to build the world’s largest radio telescope (a square kilometre) requiring the processing power of about one hundred million PCs to analyse the huge data volumes collected.
  • Climate Science, which studies long-term trends of meteorological conditions as well as of their changes over time.
  • Smart Cities projects, which collect a huge amount of sensor data to enable efficient and accurate governance to foster the sustainable economic growing and prosperity of their citizens.