Remote Sensing Big Data: Possibilities and dangers

Remote sensing is an industry riding the crest of the big data wave. It offers great opportunities to those that can harness the power, but it’s also fraught with dangers. Big data is a blanket term used to describe datasets that are large and complex, due to the quantity of data, the speed at which new data becomes available or the variety of data. Remote sensing ticks all three of these boxes!

Sentinel-1 Netherlands

Sentinel-1 image of the coast of the Netherlands; courtesy of ESA

When I first started working with remote sensing, I approached the IT department to ask for 100 megabytes of disk space for my undergraduate project and was told nobody ever needs that much storage! Currently, the amount of Earth observation data available to the community is growing exponentially. To give you some examples, the recently launched Copernicus Sentinel 1-A satellite collects around 1.7 terabytes of data daily, the number of daily images collected by Landsat 8 has been increased by 18% this month and DigitalGlobe estimates it captures two petabytes of data each year. This quantity of data gives two key challenges; firstly, where to store it? Secondly, how do you know what data is valuable to enhance your decision-making?

It’s assumed the storage issue has been resolved by cloud computing, but there is a cost for getting the data to, and from, the cloud. An interesting recent study by the University of British Columbia discovered that over 80% of scientific data is lost within 20 years, mostly due to obsolete storage devices and email addresses. I have first-hand experiences of this. My PhD data was stored on hundreds of floppy disks and when I came to use them recently most didn’t work; fortunately I have a zip drive backup – although I still need to work out how to read Quattro Pro spreadsheets! I also have several Sun workstations with associated data on tapes which will only read from the machines they were written on; so how much of this data is accessible is debatable.

How often do you think about your old and archived data? Take a moment to consider how, and where, your critical data is stored. Is all of your data available and accessible? When was the last time the back-up procedures for your scientific or business data were tested? Does your IT department know which email addresses are critical for the receipt of satellite data?

The second challenge is knowing what data to use, particularly for people new to remote sensing. There is free data, paid for data, various satellites, various data types, various formats and the list can go on. The remote sensing community needs to help by providing more bridges between the data and the user community. The datasets available can offer huge benefits for business and science, but if people have to spend hours hunting round and trying to find the right image for them, they won’t stay users for long.

You can hire remote sensing companies, like us, who can offer impartial advice to help you select the right information. Pixalytics is striving to find more ways to make data more available, more accessible and more understandable. Remote sensing data belongs to everyone, and we need to support users to get it.