Processing images - what’s the process?
How do millions of trail camera photos turn into meaningful wildlife data?
In an impressive display of enthusiasm for wildlife, community scientists across the state have provided the Snapshot New York team with over four million images from trail cameras in just about six months! This influx of data is an exciting start to the project. But how is this massive amount of data handled? For each photo received, researchers have to look through the images and label each photo with which species are present. Imagine having to click through four million photos on your computer. What is an exciting part of having a trail camera in your backyard quickly becomes a mammoth task.
As you look through your trail camera images, you may see empty photos, photos of yourself, or vehicles like ATVs or tractors riding by.
These photos add up, and while it is important that we keep them in the dataset for quality control purposes, they often make up a substantial portion of trail camera photos. To cut down on time spent looking through such photos, we can run our images through something called a “convolutional neural network” or “CNN”, which is a model that has been trained to detect people and vehicles in trail camera images as well as empty images. Once we know which images only have animals in them, we can exclude the rest from our species labeling effort. While this helps us a lot, we are still left with a mountain of images full of animals to look through. To assist us, we use another CNN, called DeepFaune New England, that has specifically been trained to label species in trail camera images in the northeastern United States.
So, how does a CNN work, and what does it mean to train a CNN?
The objective of a CNN is to identify objects in images and classify them into categories (like animals, humans, people, and empty). CNNs use layers of image filters, just like the kind you might use on social media images, to enhance or diminish various aspects of images, like color or texture. This helps the model to detect objects in images. However, CNNs don’t use predetermined filters like we often do when editing a picture we took on a smart phone. Instead, they start off using random filters, and then actively learn what filters are most useful for their objective. In this way, we train CNNs by giving them lots of example images, so they can learn which patterns and details matter most for the task we want them to do. For example, the DeepFaune New England model was trained using thousands of trail camera photos where animals were already labeled, so it could figure out which patterns and features help it recognize different species in similar images.
Example of correctly annotated image.
We don’t leave everything to the CNN, though. While CNNs are powerful, they can make mistakes. The helpful part is that they give us a confidence score for each prediction, showing how sure they are about what’s in the image. After the model processes all the photos - we review the ones with low confidence scores ourselves to make sure each image is labeled with the correct species.
Example of incorrectly annotated image
The CNN makes it possible for us to handle huge numbers of photos without getting overwhelmed.
Read more about DeepFaune New England here.
A little help from technology, and a lot of help from you—that’s how we make wildlife data come to life. Keep sending those images our way!
Author: Haley Turner, PhD student