There is little known about the conservation status of most plant species. A machine learning model which predicts if a species is at risk could change that. The modeled information can aid prioritize areas and plant species most in need of conservation.

Caption of a backlit redwood trees

Redwood trees are classified as endangered according to the IUCN Red List of Threatened Species.
Victor Anton /

What is the conservation status of plants?

Curious about how the giant panda or white rhinoceros population is doing? You can find this information on the IUCN Red List of Threatened species. Widely used by policy-makers and researchers, the Red List is the largest and most comprehensive listing of species’ conservation status. Yet, if you are more interested in the conservation status of plant species, this information is much harder to find. Only a small proportion (6.5%) of currently known plant species can be found on the Red List.

The under representation of plant species can be partially attributed to an unequal distribution of resources and collectors globally. For example, charismatic species tend to receive more attention than the average wallflower. This under representation does not mean plant species are of less importance. Plants are vital for ecosystems, food-chains and agriculture, hence a better understanding of these species’ conservation status is essential.

Photo of Ranunculus royi, a small alpine buttercup

Ranunculus royi, a small alpine buttercup, on Mt St Patrick, North Canterbury, New Zealand. This species is classified as data deficient according to the Conservation status of New Zealand indigenous vascular plants, 2017.
Phil Garnock-Jones / Wikimedia

Open-source databases provide new insights

A team of researchers led by Dr. Anahí Espíndola collected geographical, environmental and morphological information of those plant species where conservation status is known. The data, collected from open-source databases, was used to training a machine learning model. Based on this model, the research team was able to make predictions on the conservation status of over 150.000 land plant species worldwide.

The model predicts the probability a species does not fall under the IUCN Red List Least Concerned (LC) category, meaning the species is to some extent at risk. What would normally be a very costly and time-consuming process, this method provided new insights on the conservation status of plant species of which 95% cannot be found on the Red List.

The categorization of species' conservation status by the IUCN Red list

The categorization of species’ conservation status by the IUCN Red list.

Classifying plants species with random forests

The machine learning algorithm used was a random forest classifier. This type of machine learning model consists of an ensemble of individual decision trees. Although the predictions could have been made with a single decision tree, the model would have performed really well on its training data but failed on new unseen data.

Two different data-sets of species worldwide were used, one with only spatial data and one with both spatial and morphological data. The best classifiers of both data-sets had a global accuracy between 73-82%. The data-set of only spatial data was the largest, ultimately resulting in a better model performance.

Global identification of at-risk areas

The predictions showed that, at a minimum, 8% of the data deficiency land plant species have a high probability of being at risk. As the research team used worldwide data, they were able to identify the specific geographic areas most in need of conservation.

The researchers observed that areas with high biodiversity contained large amount of at-risk species, a finding that has been reported in academic literature before. A troubling result was that the areas most in need of conservation, were usually not the most well-assessed.

Worlwide distribution of at-risk plant species

Distribution of at-risk species predicted by the Random Forest Classifier trained on the best performing data-set (spatial data-set). Higher values correspond with a strong likelihood the area contains at-risk species.
A. Espíndola et al. (2018) / PNAS

Opportunities for under represented taxa

The machine learning used by the researchers can be applied to other under represented taxa. In New Zealand for example over half of the lichen species are considered to be data deficiency according to the latest environmental report.

Photograph of a New Zealand creek with riversides covered in mosses and lichens

New Zealand temperate forest is often dominated by lichens and mosses.
Victor Anton /

Prioritizing future conservation work

The model is not meant to replace but help prioritize the current process of categorizing a species’ conservation status. As certain species and areas most likely in need of conservation were quickly identified, this could aid conservationists where research and resources should be allocated.

The developed approach can be applied to any assessment system and is not limited to that of the IUCN Red List, thus could also be applied on a different spatial or taxonomic scale. For this reason, all data-sets and the model build are open-access.

Find out more about:

Peer-reviewed article
Predicting plant conservation priorities on a global scale

Main author of the paper
Asst. Prof. Anahí Espíndola, University of Maryland (