Accelerating Ocean Cleanup by Empowering Citizen Data Scientists

Use Cases & Projects, Dataiku Product, Scaling AI Katrina Power

Every year, millions of tons of plastic enter the oceans, contributing to the creation of ocean garbage patches that increasingly impact our ecosystems, health, and economies. The Ocean Cleanup, a nonprofit organization, develops and scales technology to rid oceans of this non-biodegradable debris, with the aim of cleaning up 90% of floating ocean plastic pollution.

On top of the vast amounts of data collected while conducting scientific research, within The Ocean Cleanup data science is also applied to develop technical solutions and maximize opportunities for funding and sustaining the broader organization. Facing numerous data-related challenges, it first partnered with Dataiku in 2018, working within the frame of the Ikigi.AI program to foster a collaborative environment that empowered employees and embedded “citizen data science” across the organization

Continue reading to learn more about how The Ocean Cleanup democratized data within the organization to accelerate its goal of ridding the world’s oceans of plastic. Watch the video below for further insights into The Ocean Cleanup’s activities in an interview with Lead Computational Modeler and Dataiku Neuron, Bruno Sainte-Rose!

Data Science Challenges at The Ocean Cleanup

Prior to partnering with Dataiku, when it came to data, The Ocean Cleanup faced three primary challenges that inhibited its nonprofit work:

  • Managing Data Processing Pipelines: The team needed a tool that allowed for ad-hoc data updating and processing with an optimal computing time. 

Some of the data that we were manipulating was faulty, and we were missing a tool to have a quick scan through the data, to elaborate the right approach to correct it,” explains Bruno. “We were missing a tool to automate the updating of our pipeline, especially accounting for specific triggers, but also allowing for dashboarding and reporting options.”

  • Dealing With a Wide Variety of Data: They required a versatile data processing solution as they were handling data that was both structured and unstructured, of different nature, and that came in various formats from different providers.
  • Lacking a Centralized Platform: The nonprofit was looking to implement a centralized data science platform that promoted internal collaboration with specific roles, rights, and access that could be used by both technical and non-technical collaborators across the organization. Unfortunately, those they came across were not user-friendly and required too much expertise.

Finding Solutions Through Data for Good

dataiku on ocean cleanup laptopTo help address its challenges, The Ocean Cleanup first began using Dataiku in 2018. It went on to become the inaugural partner of Ikig.AI, the for-good initiative by Dataiku that puts AI tools and skills behind nonprofit causes. 

In addition to providing a free company-wide license, within the frame of the program, Dataiku further enabled users at The Ocean Cleanup through training, project co-development, and support with the implementation of its data science projects.

Pictured: Computer at the Ocean Cleanup using Dataiku

Empowering People Across the Organization to Gain Insights and Leverage Data

Thanks to the support provided through Ikig.AI, the collaborative environment, and the overall user experience, Dataiku was adopted company wide at The Ocean Cleanup, including by non-technical staff members.

Having access to Dataiku allowed us to ramp up our data science analysis,” says Bruno. “The user-oriented, code-minimalistic approach provided by the Dataiku pipeline was a game-changer both for our data pre-processing and post-processing steps. The extensiveness of built-in operations to manipulate and prepare the data made it possible for less programming-savvy staff to perform their usually very time-consuming operations.”

This empowerment allowed The Ocean Cleanup to leverage Dataiku to find solutions to its other data-related challenges, including the improved management of data pipelines, which involved tracking past workflows and accomplishments to optimize future projects. 

“We first started using Dataiku to test our barriers in November 2018. Less than a year later, we easily replicated the same data workflow for a new test campaign, leveraging these new efficiencies to spend more time developing features,” explains Bruno. “In November 2020, during a campaign in the North Sea, our engineers only went through a quick Dataiku training to be able to reuse the previous data pipelines and features to focus their time on where they could add the most value.”

The platform's versatility also enabled users at The Ocean Cleanup to connect to data that was different in nature, format, and type and adapt them accordingly, allowing them to save much-needed time and resources. 

The Rise of Citizen Data Science

The Ocean Cleanup in action, collecting floating ocean plastic pollution

The Ocean Cleanup in action, collecting floating ocean plastic pollution

As a nonprofit organization, a key performance indicator of The Ocean Cleanup is the quality to time ratio of the tools they’re using. One of their main objectives was to have a reliable yet versatile data science platform to efficiently conduct data science projects and create a significant impact across the organization. Dataiku allowed them to dramatically improve this KPI through different levers, with their pioneering achievements recognized at the 2021 edition of the Dataiku Frontrunner Awards

Improved Operational Efficiencies to Focus Resources on Innovation

Before implementing Dataiku, The Ocean Cleanup had to extract data from different SQL databases, aggregate them, and build interpolations using a combination of platforms. Dataiku enabled them to centralize the whole workflow while maintaining the ability of practitioners to work with the technology they’re used to, allowing them to move faster and go further.  

Easy Onboarding to Bring in More People to Better Fit Project Needs

Dataiku’s user-friendly interface makes it easy for The Ocean Cleanup to onboard new people to the platform, with the learning resources, as well as the expansive catalog of events and content, giving individuals a vast perspective on data science projects. This allows its core data science team to be aided by five times more people across the organization, who are given access to the platform to bring their expertise to various projects.

Quicker Decision-Making by Gathering Everyone on the Same Platform

Thanks to its visual interface, both technical and non-technical stakeholders can understand the data workflow and the success metrics of the projects developed. In addition to enabling them to make quicker decisions, it also allows them to make adjustments on the go to meet their goals. 

Enabling Everyone to Bring in Their Skills Through Visual Recipes

The visual features of the platform, such as those for data wrangling and visualization, also enabled everyone at the nonprofit to contribute their individual skills to successfully conduct data science projects and draw insights from the data at hand. 

Systemizing the Use of Data Through a Versatile All-in-One Platform

One of the most significant impacts of the implementation of Dataiku within The Ocean Cleanup was the systemization of data science across the organization. In addition to the more technical departments, its usage has been extended to include others such as finance and communications, where it’s been leveraged to understand fundraising dynamics, optimize social media content, and more. 

Through bringing together everyone on the same platform and the rise of ‘citizen data science’, Dataiku enabled us to embed data science across the organization to create more value towards fulfilling our mission,” says Bruno.

You May Also Like

AI Isn't Taking Over, It's Augmenting Decision-Making

Read More

The Ultimate Test of ChatGPT

Read More

Maximize GenAI Impact in 2025 With Strategy and Spend Tips

Read More

Taming LLM Outputs: Your Guide to Structured Text Generation

Read More