November 9, 2022

Iteration: A Key Data Labeling Process Often Overlooked

Iteration: A Key Data Labeling Process Often Overlooked
Image from our own experiment on BOLT

Here's a situation you might be familiar with. Your CEO tells you to deploy this proof-of-concept model for a new client by Friday, or you might lose them to a competitor. A quick glance on your watch, it’s Tuesday 5 pm now, you have 3 days.

Before you leave the office, you start a data labeling workflow with some basic instructions. The dataset has 10,000 images, you hope for the best.

After all, how hard can it be for anyone to annotate some cars on the street? 

When you review the annotations on Thursday morning, you’re surprised the annotation quality is subpar. Precious time wasted, and still no usable training data. What went wrong?

Iteration. To get high-quality annotations, it takes a few iteration cycles with smaller calibration datasets so annotators get feedback and learn. 

When data labeling is done by human annotators, the success of human learning is crucial to produce accurate training data.

During the iteration process, you need to pay attention to areas like annotator training, surfacing edge cases, and improving instructions. If these are not part of the iteration cycle early on, annotation quality tends to suffer.

Here are five things you can do to iterate towards high-quality annotations when starting a new data labeling workflow.

1. Deliberately start with smaller datasets

Before scaling the labeling workflow to thousands of images, deliberately start with smaller datasets. Three calibration batches are a good starting point. Start the first calibration batch with 50 images. Review the annotations, identify issues to improve the instructions, and share feedback with the annotators. Repeat this process at least three times with 50 to 100 images.

The calibration dataset of 50 to 100 images is big enough to be representative of the entire dataset. Three iterations also maximize the chance of surfacing as many edge cases as possible.

2. Visual examples work better than words

Human annotators learn best from visual examples (just like a computer vision model)! A car on the road can look different during the day, at night, when it’s snowing, in urban traffic, and on a rural country road.

Visual examples give the annotators a sense of how annotated objects look. When someone new starts annotating the image, familiarity already sets in after studying the instructions.

3. Contrast good vs bad examples

When you set annotation rules such as “only annotate a car when it is more than 80% visible”, it can be tricky to visualize what that actually looks like. When in doubt, the annotator will apply subjective interpretation on how to annotate an object, resulting in inconsistency. To minimize inconsistencies, provide visual examples of good vs bad annotations.

Annotators can learn to easily differentiate between the correct and wrong way of annotating an object. This reduces judgment and cognitive load as these examples serve as mental shortcuts.

4. Leave no room for misinterpretation - clarify edge cases

You will discover edge cases from inconsistent annotations. This is a result of subjective judgments. Is a makeshift construction sign a traffic sign? For a pedestrian towing a suitcase, do you annotate the suitcase? When a car is being towed by a tow truck, do you annotate it as one or two instances?

All these scenarios are up for subjective interpretation if you don’t make rules for them. Keep adding rules in the instructions for these edge cases.

Where judgment is needed, remind the annotator about the model objective. This helps the annotator to step into your shoes and make good judgment.

5. Clarify rules when you find inconsistent annotations

If you notice inconsistent annotations of the same scenario across different images, it’s a signal the rules are not clear. For example, some car annotations include the antenna but not the others. 

You can reflect on the objective of the machine learning model to determine if clarifying this scenario matters. If it does matter, update the car class to either include or exclude the antenna in the general rule.

Creating good training data begins with training humans well

Human annotators learn from clear instructions with good examples, done over multiple iterations. This is how you get high-quality annotations.

Follow the steps above when you’re starting a new data labeling workflow, you can expect the annotation quality to be much higher when you scale to thousands or even millions of images.

We’ve built SUPA BOLT with these iteration strategies in mind when you start a data labeling workflow. Try it for free and start with your own calibration dataset.

Bryce Wilson
Data Engineer at Black.ai

Consistent support

If there's one thing that makes SUPA stand out, it's their commitment to providing consistent support throughout the data labeling process. The team actively and efficiently engaged with us to ensure any ambiguity in the dataset was cleared up.

Jonas Olausson
Data Engineer at Black AI
The best interface for self-service labeling.

Everything from uploading data to seeing it labeled in real time was really cool. This is just way simpler to use compared to Amazon Sagemaker and LabelBox. I was also very impressed with how the platform delivered exactly what we needed in terms of label quality.

Sravan Bhagavatula
Director of Computer Vision at Greyscale AI
Launch a revised batch within hours

I was also able to view the labels as they were being generated, which gave me quick feedback about the label quality, rather than waiting for the whole batch. This replaced my standard manual QA process using external tools like Voxel's Fiftyone, as the labels were clear and easy to parse through in real-time.

Sparsh Shankar
Associate ML Engineer at Sprinklr
Really quick

The annotators were really quick. I would upload and 5 minutes later - 10 images done. I checked 5 minutes later - 100 images done.

Puneet Garg
Head of Data Science at Carousell
Good quality judgments

The team at [SUPA] has been very professional & easy to work with since we started our collaboration in 2019. They've provided us with good quality judgments to train, tune, and validate our Search & Recommendations models.

Book a demo

Let us walk you through the entire data labeling experience, from set up to export

Schedule a chat