Transforming E-commerce Search Relevance with AI: SUPA's Success in Delivering 1 Million Relevancy Judgements per Month


A US online retailer needed to enhance their search experience for millions of products, handling subjective queries and processing over 300,000 query-result pairs monthly to improve their AI model.


SUPA created a custom labeling platform with a consensus model, using 180 annotators to deliver about 1 million ranking judgments per month, ensuring accuracy and consistency.


SUPA's infrastructure helped the client reduce search algorithm defects by over 40%, significantly improving the e-commerce experience for their customers.

The Problem

Our client, a US publicly listed online retailer with a vast product inventory, faced significant challenges in enhancing their customer search experience. With millions of products available, ensuring the relevance and accuracy of search results was crucial for maintaining customer satisfaction and driving sales. The primary problems included:

1. Subjective Nature of Customer Queries: Evaluating the relevance of search results for subjective customer queries required careful judgment.

2. High Volume with Fast Turnaround: The client needed assessments for over 300,000 query-result pairs per month, necessitating a fast and efficient workflow.

The workflow involved assessing the relevance of outputs from over 3 million search queries over nine months to train the client’s AI model to improve its search algorithms.

The Solution

SUPA addressed these challenges with a tailored and automated approach:

1. Bespoke Labeling Platform: SUPA configured a custom labeling platform that supported a consensus model workflow. This workflow utilized the judgments of three different annotators to calculate the relevance of each data point, ensuring high accuracy and consistency in the assessments.

2. Diverse Annotator Pool: To handle the substantial volume of data, SUPA leveraged its diverse annotator pool. By staffing 180 annotators throughout the project period, SUPA was able to deliver approximately 1 million ranking judgments per month.

3. Consensus Model Workflow: The labeling platform was designed to facilitate a consensus model. Each data point was reviewed by three annotators, and their judgments were aggregated to determine the final relevancy score. This method reduced individual bias and improved the reliability of the assessments.

Scalability: SUPA’s ability to scale up its operations was critical. With 180 annotators working concurrently, the team was able to meet the demanding schedule, providing timely and accurate relevance assessments.

The Results 

SUPA’s robust infrastructure allowed the client to achieve their goal of reducing their search algorithms defect rate by more than 40%. By providing high-quality, timely relevance assessments, SUPA played a pivotal role in enhancing the e-commerce experience for the client’s customers.

Book a demo

Let us walk you through the entire data labeling experience, from set up to export

Schedule a chat