Last week I attended the 2017 Conference on Big Data from Space (BiDS’17) that was held in Toulouse, France. The conference was co-organised by the European Space Agency (ESA), the Joint Research Centre (JRC) of the European Commission (EC), and the European Union Satellite Centre (SatCen). It aimed to bring together people from multiple disciplines to stimulate the exploitation Earth Observation (EO) data collected in space.
The event started on Tuesday morning with keynotes from the various co-organising space organisations. Personally, I found the talk by Andreas Veispak, from the European Commission’s (EC) DG GROW department which is responsible for EU policy on the internal market, industry, entrepreneurship and SMEs, particularly interesting. Andreas has a key involvement in the Copernicus and Galileo programmes and described the Copernicus missions as the first building block for creating an ecosystem, which has positioned Europe as a global EO power through its “full, free and open” data policy.
The current Sentinel satellite missions will provide data continuity until at least 2035 with huge amounts of data generated, e.g., when all the Sentinel satellite missions are operational over 10 petabytes of data per year will be produced. Sentinel data has already been a huge success with current users exceeding what was expected by a factor of 10 or 20 and every product has been downloaded at least 10 times. Now, the key challenge is to support these users by providing useful information alongside the data.
The ESA presentation by Nicolaus Hanowski continued the user focus by highlighting that there are currently over 100 000 registered Copernicus data hub users. Nicolaus went on to describe that within ESA success is now being measured by use of the data for societal needs, e.g., the sustainable development goals, rather than just the production of scientific data. Therefore, one of the current aims is reduce the need for downloading by having a mutualised underpinning structure, i.e. the Copernicus Data and Information Access Services (DIAS) that will become operational in the second quarter of 2018, which will allow users to run their computer code on the data without the need for downloading. The hope is that this will allow users to focus on what they can do with the data, rather than worrying around storing it!
Charles Macmillan from JRC described their EO Data and Processing Platform (JEODPP) which is a front end based around the Jupyter Notebook that allows users to ask questions using visualisations and narrative text, instead of just though direct programming. He also noted that increasingly the data needed for policy and decision making is held by private organisations rather than government bodies.
The Tuesday afternoon was busy as I chaired the session on Information Generation at Scale. We had around 100 people who heard some great talks on varied subjects such as mass processing of Sentinel & Landsat data for mapping human settlements, 35 years of AVHRR data and large scale flood frequency maps using SAR data.
I presented a poster at the Wednesday evening session, titled “Application Of Earth Observation To A Ugandan Drought And Flood Mitigation Service”. Weâ€™re part of a consortium working on this project which is funded via the UK Space Agencyâ€™s International Partnership Programme. Itâ€™s focus is on providing underpinning infrastructure for the Ugandan government so that end users, such as farmers, can benefit from more timely and accurate information â€“ delivered through a combination of EO, modelling and ground-based measurements.
It was interesting to hear Grega Milcinski from Sinergise discuss a similar approach to users from the lessons they learnt from building the Sentinel Hub. They separated the needs of science, business and end users. With their platform focusing on scientific and business users, theyâ€™d chosen not to target end users because of the tough requirements such as the demand for response times of less than two seconds. Itâ€™s quite a sobering to hear as weâ€™re hoping to move towards targeting these end users!
There were some key technology themes that came of the presentations at the conference:
- Jupyter notebooks were popular for frontend visualisation and data analytics, so users just need to know some basic python to handle large and complex datasets.
- Making use of cloud computing using tools such as Docker and Apache Spark for running multiple instances of code with integrated parallel processing.
- Raw data and processing on the fly: for both large datasets within browsers and by having the metadata stored so you can quickly query before committing to processing.
- Analysis ready data in data cubes, i.e. the data has been processed to a level where remote sensing expertise isn’t so critical.
It was a great thought provoking conference. If you’d like to get more detail on what was presented then a book of extended abstracts is available here. The next event is planned for 19-21 February 2019 in Munich, Germany and Iâ€™d highly recommend it!