November 9, 2022

Iteration: A Key Data Labeling Process Often Overlooked

Iteration: A Key Data Labeling Process Often Overlooked
Image from our own experiment on BOLT

Here's a situation you might be familiar with. Your CEO tells you to deploy this proof-of-concept model for a new client by Friday, or you might lose them to a competitor. A quick glance on your watch, it’s Tuesday 5 pm now, you have 3 days.

Before you leave the office, you start a data labeling workflow with some basic instructions. The dataset has 10,000 images, you hope for the best.

After all, how hard can it be for anyone to annotate some cars on the street? 

When you review the annotations on Thursday morning, you’re surprised the annotation quality is subpar. Precious time wasted, and still no usable training data. What went wrong?

Iteration. To get high-quality annotations, it takes a few iteration cycles with smaller calibration datasets so annotators get feedback and learn. 

When data labeling is done by human annotators, the success of human learning is crucial to produce accurate training data.

During the iteration process, you need to pay attention to areas like annotator training, surfacing edge cases, and improving instructions. If these are not part of the iteration cycle early on, annotation quality tends to suffer.

Here are five things you can do to iterate towards high-quality annotations when starting a new data labeling workflow.

1. Deliberately start with smaller datasets

Before scaling the labeling workflow to thousands of images, deliberately start with smaller datasets. Three calibration batches are a good starting point. Start the first calibration batch with 50 images. Review the annotations, identify issues to improve the instructions, and share feedback with the annotators. Repeat this process at least three times with 50 to 100 images.

The calibration dataset of 50 to 100 images is big enough to be representative of the entire dataset. Three iterations also maximize the chance of surfacing as many edge cases as possible.

2. Visual examples work better than words

Human annotators learn best from visual examples (just like a computer vision model)! A car on the road can look different during the day, at night, when it’s snowing, in urban traffic, and on a rural country road.

Visual examples give the annotators a sense of how annotated objects look. When someone new starts annotating the image, familiarity already sets in after studying the instructions.

3. Contrast good vs bad examples

When you set annotation rules such as “only annotate a car when it is more than 80% visible”, it can be tricky to visualize what that actually looks like. When in doubt, the annotator will apply subjective interpretation on how to annotate an object, resulting in inconsistency. To minimize inconsistencies, provide visual examples of good vs bad annotations.

Annotators can learn to easily differentiate between the correct and wrong way of annotating an object. This reduces judgment and cognitive load as these examples serve as mental shortcuts.

4. Leave no room for misinterpretation - clarify edge cases

You will discover edge cases from inconsistent annotations. This is a result of subjective judgments. Is a makeshift construction sign a traffic sign? For a pedestrian towing a suitcase, do you annotate the suitcase? When a car is being towed by a tow truck, do you annotate it as one or two instances?

All these scenarios are up for subjective interpretation if you don’t make rules for them. Keep adding rules in the instructions for these edge cases.

Where judgment is needed, remind the annotator about the model objective. This helps the annotator to step into your shoes and make good judgment.

5. Clarify rules when you find inconsistent annotations

If you notice inconsistent annotations of the same scenario across different images, it’s a signal the rules are not clear. For example, some car annotations include the antenna but not the others. 

You can reflect on the objective of the machine learning model to determine if clarifying this scenario matters. If it does matter, update the car class to either include or exclude the antenna in the general rule.

Creating good training data begins with training humans well

Human annotators learn from clear instructions with good examples, done over multiple iterations. This is how you get high-quality annotations.

Follow the steps above when you’re starting a new data labeling workflow, you can expect the annotation quality to be much higher when you scale to thousands or even millions of images.

We’ve built SUPA BOLT with these iteration strategies in mind when you start a data labeling workflow. Try it for free and start with your own calibration dataset.