Big Data From Space

Last week I attended the 2017 Conference on Big Data from Space (BiDS’17) that was held in Toulouse, France. The conference was co-organised by the European Space Agency (ESA), the Joint Research Centre (JRC) of the European Commission (EC), and the European Union Satellite Centre (SatCen). It aimed to bring together people from multiple disciplines to stimulate the exploitation Earth Observation (EO) data collected in space.

The event started on Tuesday morning with keynotes from the various co-organising space organisations. Personally, I found the talk by Andreas Veispak, from the European Commission’s (EC) DG GROW department which is responsible for EU policy on the internal market, industry, entrepreneurship and SMEs, particularly interesting. Andreas has a key involvement in the Copernicus and Galileo programmes and described the Copernicus missions as the first building block for creating an ecosystem, which has positioned Europe as a global EO power through its “full, free and open” data policy.

The current Sentinel satellite missions will provide data continuity until at least 2035 with huge amounts of data generated, e.g., when all the Sentinel satellite missions are operational over 10 petabytes of data per year will be produced. Sentinel data has already been a huge success with current users exceeding what was expected by a factor of 10 or 20 and every product has been downloaded at least 10 times. Now, the key challenge is to support these users by providing useful information alongside the data.

The ESA presentation by Nicolaus Hanowski continued the user focus by highlighting that there are currently over 100 000 registered Copernicus data hub users. Nicolaus went on to describe that within ESA success is now being measured by use of the data for societal needs, e.g., the sustainable development goals, rather than just the production of scientific data. Therefore, one of the current aims is reduce the need for downloading by having a mutualised underpinning structure, i.e. the Copernicus Data and Information Access Services (DIAS) that will become operational in the second quarter of 2018, which will allow users to run their computer code on the data without the need for downloading. The hope is that this will allow users to focus on what they can do with the data, rather than worrying around storing it!

Charles Macmillan from JRC described their EO Data and Processing Platform (JEODPP) which is a front end based around the Jupyter Notebook that allows users to ask questions using visualisations and narrative text, instead of just though direct programming. He also noted that increasingly the data needed for policy and decision making is held by private organisations rather than government bodies.

The Tuesday afternoon was busy as I chaired the session on Information Generation at Scale. We had around 100 people who heard some great talks on varied subjects such as mass processing of Sentinel & Landsat data for mapping human settlements, 35 years of AVHRR data and large scale flood frequency maps using SAR data.

‘Application Of Earth Observation To A Ugandan Drought And Flood Mitigation Service’ poster

I presented a poster at the Wednesday evening session, titled “Application Of Earth Observation To A Ugandan Drought And Flood Mitigation Service”. We’re part of a consortium working on this project which is funded via the UK Space Agency’s International Partnership Programme. It’s focus is on providing underpinning infrastructure for the Ugandan government so that end users, such as farmers, can benefit from more timely and accurate information – delivered through a combination of EO, modelling and ground-based measurements.

It was interesting to hear Grega Milcinski from Sinergise discuss a similar approach to users from the lessons they learnt from building the Sentinel Hub. They separated the needs of science, business and end users. They’ve chosen not to target end users due to the challenges surrounding the localisation and customisation requirements of developing apps for end users around the world. Instead they’ve focussed on meeting the processing needs of scientific and business users to give them a solid foundation upon which they can then build end user applications. It was quite thought provoking to hear this, as we’re hoping to move towards targeting these end users in the near future!

There were some key technology themes that came of the presentations at the conference:

  • Jupyter notebooks were popular for frontend visualisation and data analytics, so users just need to know some basic python to handle large and complex datasets.
  • Making use of cloud computing using tools such as Docker and Apache Spark for running multiple instances of code with integrated parallel processing.
  • Raw data and processing on the fly: for both large datasets within browsers and by having the metadata stored so you can quickly query before committing to processing.
  • Analysis ready data in data cubes, i.e. the data has been processed to a level where remote sensing expertise isn’t so critical.

It was a great thought provoking conference. If you’d like to get more detail on what was presented then a book of extended abstracts is available here. The next event is planned for 19-21 February 2019 in Munich, Germany and I’d highly recommend it!

Living Planet Is Really Buzzing!

Living planet rotating global in the exhibition area, photo: S Lavender

Living planet rotating global in the exhibition area, photo: S Lavender

This week I’m at the 2016 European Space Agency’s Living Planet Symposium taking place in sunny Prague. I didn’t arrive until lunchtime on Monday and with the event already underway I hurried to the venue. First port of call was the European Association of Remote Sensing Companies (EARSC) stand as we’ve got copies of flyers and leaflets on their stand. Why not pop along and have look!

The current excitement and interest in Earth observation (EO) was obvious when I made my way towards the final sessions of the day. The Sentinel-2 and Landsat-8 synergy presentations were packed out, all seats taken and people were crowding the door to watch!

I started with the Thematic Exploitation Platforms session. For a long time the remote sensing community has wanted more data, and now we’re receiving it in ever larger quantities e.g., the current Copernicus missions are generating terabytes of data daily. With the storage requirements this generates there is a lot of interest in the use of online platforms to hold data, and then you upload your code to it, or use tools provided by the platform, rather than everyone trying to download their own individual copies. It was interesting to compare and contrast the approaches taken with hydrology, polar, coastal, forestry and urban EO data.

Tuesday was always going to be my busiest day of the Symposium as I was chairing two sessions and giving a presentation. I had an early start as the 0800 session on Coastal Zones I was co-chairing alongside Bob Brewin –a former PhD student of mine! It was great to see people presenting their results using Sentinel-2. The spatial resolution, 10m for the highest resolution wavebands, allows us to see the detail of suspended sediment resuspension events and the 705 nm waveband can be used for phytoplankton; but we’d still like an ocean colour sensor at this spatial resolution!

In the afternoon I headed into European Climate Data Records, where there was an interesting presentation on a long time-series AVHRR above-land aerosol dataset where the AVHRR data is being vicariously calibrated using the SeaWiFS ocean colour sensor. Great to see innovation within the industry where sensors launched one set of applications can be reused in others. One thing that was emphasised by presenters in both this session, and the Coastal Zone one earlier, was the need to reprocess datasets to create improved data records.

My last session of the day was on Virtual Research, where I was both co-chairing and presenting. It returned to the theme of handling large datasets, and the presentations focused on building resources that make using EO data easier. This ranged from bringing in-situ and EO data together by standardising the formatting and metadata of the in-situ data, through community datasets for algorithm performance evaluation, to data cubes that bring all the data needed to answer specific questions together into a three- (or higher) dimensional array that means you don’t spend all your time trying to read different datasets versus ask questions of them. My own presentation focused on our involvement with the ESA funded E-Collaboration for Earth Observation (E-CEO) project, which developed a collaborative platform  where challenges can be initiated and evaluated; allowing participants to upload their code and have it evaluated against a range of metrics. We’d run an example challenge focused on the comparison of atmospheric correction processors for ocean colour data that, once setup, could easily be rerun.

I’ve already realised that there too many interesting parallel sessions here, as I missed the ocean colour presentations which I’ve heard were great. The good news for me is that these sessions were recorded. So if you haven’t be able to make to Prague in person, or like me you are here but haven’t seen everything you wanted there are going to be selection of sessions to view on ESA’s site, for example, you can see the opening session here.

Not only do events like this gives you to a fantastic chance learn about what’s happening across the EO community, but they also give you the opportunity to catch up with old friends. I am looking forward to the rest of the week!