Acquiring high-quality labeled data has been a long-standing challenge for the machine learning (ML) industry. Challenges include:
Given the above challenges, a key question we had was: How might we improve quality in data labeling, taking into account different needs of projects, rising standards and the need to identify proficient labelers?
The starting point for improving quality was figuring out how to measure it. We couldn’t move a needle that didn’t exist yet. Aligning all stakeholders, whether labeler, client or engineer on what quality meant proved difficult without a frame of reference.
To create said reference, we chose to start simple by defining quality in its simplest form - the presence of mistakes in labeling.
However, how do we define mistakes? On what basis? The answer was quite simple in hindsight. Every data labeling project came with a set of instructions from the client that’s the source of truth for labelers working on the projects. The rules for what makes a label wrong came from the instructions given by the client.
Image annotation involves assigning labels to pixels in an image in different use cases. Consider a simple project where a labeler would have to draw bounding boxes around cats and dogs.
For image annotation, mistakes could be divided into the following general categories based on prior research:
With that, this led to the conception of the Accuracy Scorecard. Think of this as a precise ledger where we recorded mistakes made in image annotation projects, based on the aforementioned mistake types. We used a simple formula to do this where:
Application of the scorecard gave us a clear view of performance at both the project and individual level.
Implementing the scorecard across multiple projects proved a resounding success. Projects started to improve slowly but steadily over time due to the following key factors:
Learn more about our use cases here