data saves the whales

Save the Whales (With Big Data)

It’s not easy being a North Atlantic right whale. Decimated by nearly a millennium of whaling, today’s remaining population of under 500 whales is threatened by collisions with shipping traffic, entanglement in fishing equipment, and changes in the chemistry of their coastal water habitats due to pollution.

It’s also not easy being a marine biologist tasked with protecting North Atlantic right whales. Teams at the National Oceanic and Atmospheric Administration (NOAA)’s Northeast Fisheries Science Center fly aerial surveys over the United States’ eastern seaboard to monitor whale populations. When they spot a right whale, they capture a photo which they then take back to the lab and cross-reference a vast catalog of area whales maintained by the New England Aquarium in order to identify it. While right whales have distinct facial markings (called callosities) which allow researchers to identify them, the process of manually sorting the photos is both inefficient and time-consuming, especially for marine biologists already strapped for time.

NOAA biologist Christin Khan was determined to find an easier way to identify the whales, and with the help of Kaggle, a data analytics competition site, the NOAA Right Whale Recognition challenge was launched at the end of August 2015. Teams of data scientists competed to develop an algorithm that could identify any living North Atlantic right whale from a photograph of its face, and the NOAA granted competitors access to a unique data set of thousands of photographs of each whale, manually tagged by Khan and fellow biologist Leah Crowe, from which to build their algorithms. A total of 470 competitors on 364 teams from across the globe competed for the $10,000 prize, along with free data analytics software from competition sponsor MathWorks.

The competition concluded in early January 2016 with the team from the Warsaw office of the data science company emerging victorious. In a blog post on the company’s website, team leader and CSO Robert Bogucki explains how the team developed the winning algorithm. The backbone of their solution was a machine learning technique known as convolutional neural networks (CNNs). CNNs are programs that create a network of overlapping layers on top of inputted data, such as whale photographs, which the computer can analyze with much greater efficiency and accuracy than simply examining the entire photo, pixel by pixel. Once “trained” on enough sample data, the programs can begin to predict patterns themselves and identify photos that look the same.

The algorithm used multiple CNNs for its three core steps. Step 1: the “head localizer,” would zoom in and crop photos around the whales’ heads. Step 2: the “head aligner,” would shift the photo so that the whale’s blowhole was on one side and its bonnet (or snout) was on the other, a process the team referred to as “making the passport photo of each whale.” Step 3: “the final classifier,” quickly matched an inputted “passport photo” to other photos of the same whale based on its markings, thus identifying the whale. The final algorithm could identify whales with a whopping 87 percent accuracy.

In an article by The Atlantic, Khan said that besides the major benefit of getting to spend less time sorting through photos and more time doing conservation work, the algorithm could help teams like hers in several ways. Some researchers conduct biopsies on the whales in order to run genetic tests and being able to immediately check whether or not they had tested a whale would make their work less invasive. Also, if biologists can immediately identify a whale entangled in fishing gear, they can better respond to helping it as they’ll be able to quickly access its health records or check if it has gotten tangled before. Khan’s hope is that’s solution can be expanded to assist researchers studying other marine mammals and give other critically endangered species like the North Atlantic right whale a chance to keep on swimming.