The cost of ‘free data’

False Colour Composite of the Black Rock Desert, Nevada, USA.  Image acquired on 6th April 2016. Data courtesy of NASA/JPL-Caltech, from the Aster Volcano Archive (AVA).

False Colour Composite of the Black Rock Desert, Nevada, USA. Image acquired on 6th April 2016. Data courtesy of NASA/JPL-Caltech, from the Aster Volcano Archive (AVA).

Last week, the US and Japan announced free public access to the archive of nearly 3 million images taken by ASTER instrument; previously this data had only been accessible with a nominal fee.

ASTER, Advanced Spaceborne Thermal Emission and Reflection Radiometer, is a joint Japan-US instrument aboard NASA’s Terra satellite with the data used to create detailed maps of land surface temperature, reflectance, and elevation. When NASA made the Landsat archive freely available in 2008, an explosion in usage occurred. Will the same happen to ASTER?

As a remote sensing advocate I want many more people to be using satellite data, and I support any initiative that contributes to this goal. Public satellite data archives such as Landsat, are often referred to as ‘free data’. This phrase is unhelpful, and I prefer the term ‘free to access’. This is because ‘free data’ isn’t free, as someone has already paid to get the satellites into orbit, download the data from the instruments and then provide the websites for making this data available. So, who has paid for it? To be honest, it’s you and me!

To be accurate, these missions are generally funded by the tax payers of the country who put the satellite up. For example:

  • ASTER was funded by the American and Japanese public
  • Landsat is funded by the American public
  • The Sentinel satellites, under the Copernicus missions, are funded by the European public.

In addition to making basic data available, missions often also create a series of products derived from the raw data. This is achieved either by commercial companies being paid grants to create these products, which can then be offered as free to access datasets, or alternatively the companies develop the products themselves and then charge users to access to them.

‘Free data’ also creates user expectations, which may be unrealistic. Whenever a potential client comes to us, there is always a discussion on which data source to use. Pixalytics is a data independent company, and we suggest the best data to suit the client’s needs. However, this isn’t always the free to access datasets! There are a number of physical and operating criteria that need to be considered:

  • Spectral wavebands / frequency bands – wavelengths for optical instruments and frequencies for radar instruments, which determine what can be detected.
  • Spatial resolution: the size of the smallest objects that can be ‘seen’.
  • Revisit times: how often are you likely to get a new image – important if you’re interested in several acquisitions that are close together.
  • Long term archives of data: very useful if you want to look back in time.
  • Availability, for example, delivery schedule and ordering requirement.

We don’t want any client to pay for something they don’t need, but sometimes commercial data is the best solution. As the cost of this data can range from a few hundred to thousand pounds, this can be a challenging conversation with all the promotion of ‘free data’.

So, what’s the summary here?

If you’re analysing large amounts of data, e.g. for a time-series or large geographical areas, then free to access public data is a good choice as buying hundreds of images would often get very expensive and the higher spatial resolution isn’t always needed. However, if you want a specific acquisition over a specific location at high spatial resolution then the commercial missions come into their own.

Just remember, no satellite data is truly free!

Reprocessing Data Challenges of Producing A Time Series

August 2009 Monthly Chlorophyll-a Composite; data courtesy of the ESA Ocean Colour Climate Change Initiative project

August 2009 Monthly Chlorophyll-a Composite; data courtesy of the ESA Ocean Colour Climate Change Initiative project

Being able to look back at how our planet has evolved over time, is one of the greatest assets of satellite remote sensing. With Landsat, you have a forty year archive to examine changes in land use and land cover. For in situ (ground based) monitoring, this is something that’s only available for a few locations, and you’ll only have data for the location you’re measuring. Landsat’s continuous archive is an amazing resource, and it is hoped that the European Union’s Copernicus programme will develop another comprehensive archive. So with all of this data, producing a time series analysis is easy isn’t it?

Well, it’s not quite that simple. There are the basic issues of different missions having different sensors, and so you need to know whether you’re comparing like with like. Although data continuity has been a strong element of Landsat, the sensors on Landsat 8 are very different to those on Landsat 1. Couple this with various positional, projection and datum corrections, and you have lots of things to think about to produce an accurate time series. However, once you’ve sorted all of these out and you’ve got your data downloaded, then everything is great isn’t it?

Well, not necessarily; you’ve still got to consider data archive reprocessing. The Space Agencies, who maintain this data, regularly reprocess satellite datasets. This means that the data you downloaded two years ago, isn’t necessarily the same data that could be downloaded today.

We faced this issue recently as NASA completed the reprocessing of the MODIS Aqua data, which began in 2014. The data from the MODIS Aqua satellite has been reprocessed seven times, whilst its twin, Terra, has been reprocessed three times.

Reprocessing the data can include changes to some, or all, of the following:

  • Update of the instrument calibration, to take account of current knowledge about sensor degradation and radiometric performance.
  • Appyling new knowledge, in terms of atmospheric correction and/or derived product algorithms.
  • Changes to parallel datasets that are used as inputs to the processing; for example, the meteorological conditions are used to aid the atmospheric correction.

Occasionally, they also change the output file format the data is provided in; and this is what has caught us out. The MODIS output file format has changed from HDF4 to NetCDF4 with the reason being that NetCDF is a more efficient, sustainable, extendable and interoperable data file format. A change we’ve known about for a long time, as it resulted from community input, but until you get the new files you can’t check and update your software.

We tend to use a lot of Open Source software, enabling our clients to carry on working with remote sensing products without having to invest in expensive software. The challenge is that it takes software provider time to catch up with the format changes. Hence, the software is unable to load the new files or the data is incorrectly read e.g., comes in upside down. Sometimes large changes, mean you may have to alter your approach and/or software.

Reprocessing is important, as it improves the overall quality of the data, but you do need to keep on top what is happening with the data to ensure that you are comparing like with like when you analyse a time series.

Ocean Colour Cubes

August 2009 Monthly Chlorophyll-a Composite; data courtesy of the ESA Ocean Colour Climate Change Initiative project

August 2009 Monthly Chlorophyll-a Composite; data courtesy of the ESA Ocean Colour Climate Change Initiative project

It’s an exciting time to be in ocean colour! A couple of weeks ago we highlighted the new US partnership using ocean colour as an early warning system for harmful freshwater algae blooms, and last week a new ocean colour CubeSat development was announced.

Ocean colour is something very close to our heart; it was the basis of Sam’s PhD and a field of research she is highly active in today. When Sam began studying her PhD, Coastal Zone Color Scanner (CZCS) was the main source of satellite ocean colour data, until it was superseded by the Sea-viewing Wide Field-of-view Sensor (SeaWiFS) that became the focus of her role at Plymouth Marine Laboratory.

Currently, there are a number ocean colour instruments in orbit:

  • NASA’s twin MODIS instruments on the Terra and Aqua satellites
  • NOAA’s Visible Infrared Imager Radiometer Suite (VIIRS)
  • China’s Medium Resolution Spectral Imager (MERSI), Chinese Ocean Colour and Temperature Scanner (COCTS) and Coastal Zone Imager (CZI) onboard several satellites
  • South Korea’s Geostationary Ocean Color Imager (GOCI)
  • India’s Ocean Colour Monitor on-board Oceansat-2

Despite having these instruments in orbit, there is very limited global ocean colour data available for research applications. This is because the Chinese data is not easily accessible outside China, Oceansat-2 data isn’t of sufficient quality for climate research and GOCI is a geostationary satellite so the data is only for a limited geographical area focussed on South Korea. With MODIS, the Terra satellite has limited ocean colour applications due to issues with its mirror and hence calibration; and recently the calibration on Aqua has also become unstable due to its age. Therefore, the ocean colour community is just left with VIIRS; and the data from this instrument has only been recently proved.

With limited good quality ocean colour data, there is significant concern over the potential loss of continuity in this valuable dataset. The next planned instrument to provide a global dataset will be OLCI onboard ESA’s Sentinel 3A, due to be launched in November 2015; with everyone having their fingers crossed that MODIS will hang on until then.

Launching a satellite takes time and money, and satellites carrying ocean colour sensors have generally been big, for example, Sentinel 3A weighs 1250 kg and MODIS 228.7 kg. This is why the project was announced last week to build two Ocean Colour CubeSats is so exciting; they are planned to weigh only 4 kg which reduces both the expense and the launch lead time.

The project, called SOCON (Sustained Ocean Observation from Nanosatellites), will see Clyde Space, from Glasgow in the UK, will build an initial two prototype SeaHawk CubeSats with HawkEye Ocean Colour Sensors, with a ground resolution of between 75 m and 150 m per pixel to be launched in early 2017. The project consortium includes the University of North Carolina, NASA’s Goddard Space Flight Centre, Hawk Institute for Space Sciences and Cloudland Instruments. The eventual aim is to have constellations of CubeSats providing a global view of both ocean and inland waters.

There are a number of other planned ocean colour satellite launches in the next ten years including following on missions such as Oceansat-3, two missions from China, GOCI 2, and a second VIIRS mission.

With new missions, new data applications and miniaturised technology, we could be entering a purple patch for ocean colour data – although purple in ocean colour usually represents a Chlorophyll-a concentration of around 0.01 mg/m3 on the standard SeaWiFS colour palette as shown on the image at the top of the page.

We’re truly excited and looking forward to research, products and services this golden age may offer.