Researchers from Stony Brook University, New York, use machine learning and satellite imagery to automatically identify images of whales and facilitate their monitoring.

Underwater photo of a beluga whale

Underwater photo of a beluga whale.
Keran McKenzie / Picspree

Long distance swimmers are difficult to monitor

There is still much to be understood about whales and other cetacean species, yet their high mobility and large ranges make these aquatic mammals difficult to monitor with the current surveying methods.

Most research monitoring cetaceans focus on coastal areas which are logistically easier to survey, lacking the research on other habitats such as deep-water and continental shelf regions. Although methods such as data loggers, observations by citizen scientists, aerial and ship surveys could be used for broad-scale monitoring, these techniques are either costly or contain large data gaps.

Harnessing the power of satellite imagery and machine learning

The use of high-resolution satellite imagery is a potential solution for broad-scale monitoring of cetaceans. However, this method results in a large volume of data impossible to classify manually. Machine learning algorithms could semi-automate this process, identifying the images with a high probability of containing cetaceans, which can then be manually checked.

Examples of satellite imagery including open water, southern right whales, and whale-watching boats.

Examples of satellite imagery: A) open water, B) southern right whale, and C) whale-watching boat.
A. Borowicz et al. 2019 / PLoS ONE

High accuracy of Convolutional Neural Networks

The researchers trained machine learning models on aerial imagery of minke whales (Balaenoptera acutorostrata), to predict whale hotspots based on satellite imagery. The images were split into smaller pixel tiles and Convolutional Neural Network (CNN) algorithms were trained to identify whales within these tiles. Researchers also tried other machine-learning models (e.g. Ridge Regression and Support Vector Classifier) but CNN performed better at identifying whales within the tiles.

The best CNN model correctly classified all tiles with whales (100%) and almost all tiles containing only water (94%). This means a small percentage of water tiles ended up being classified as whale, however this could be easily manually checked as only a small proportion of the dataset was identified as whale.

Diagram of the machine learning approach used by the researchers to identify whales from aerial imagery

Approach used to identify whales from aerial imagery. Aerial imagery (above) is down-sampled, tiled, and then used to train the model. Satellite imagery (below) is pansharpened and tiled before the model can detect whales.
A. Borowicz et al. 2019 / PLoS ONE

The importance of model comparison

Researchers highlighted the importance of comparing the performances of different machine learning models. For successful classification of whales, deep learning algorithms such as CNN seem to be a better choice.

Confusion matrix of the best ML model used by the researchers (ResNet-152).

Confusion matrix for the best model (ResNet-152).
A. Borowicz et al. 2019 / PLoS ONE

Limited open water imagery

Currently the biggest hurdle is the availability of open water satellite imagery with a high resolution. The current archive is limited and requesting additional high-resolution imagery can be costly. The research group hopes that an increased interest in monitoring cetacean species with satellite imagery will increase the availability of open-water imagery.

Researchers also outlined a promising avenue for further research, using additional spectral bands to increase the versatility of the models.

Whale lifting its fluke (tail) out of the water before diving into the ocean.

Whale lifting its fluke (tail) out of the water before diving into the ocean.
Tim Nooteboom / Picspree

Next steps

Although this study is not yet a full broad-scale application, it shows the potential of using satellite imagery for monitoring cetacean species.

The model could be improved by training on a larger dataset, for example with aerial imagery of other species besides minke whales. Model training could also be expanded with the introduction of multiple classes that currently do not fall under the water or whale category, such as ships, rocks and land. With the appropriate data-set, in theory, even classification on the species level could be done.

Find out more about:

Peer-reviewed article
Aerial-trained deep learning networks for surveying cetaceans from satellite imagery

Main author of the paper
Alex Borowicz, PhD student at Stony Brook University (aborowicz@coa.edu)