August 31, 2022

A quick guide on how to write good instructions for data labeling

Data labeling instructions can be quite tedious and on occasion, tricky.

But when instructions are done right, the results are obvious: Annotations are clearly drawn. The correct labels are used. Edge cases are handled with care. The link is clear. Good labeling instructions equals better quality annotations.

But what exactly does good mean? We spoke to our SUPA analysts Irfan and Sabrina (our Enterprise team experts) to give us a few tips on writing good quality instructions.

Together, Irfan and Sabrina have successfully managed projects from sports and manufacturing, to augmented reality and construction – phew! They know their stuff and we’re here to share their wisdom with you 😉

Five tips on how to write good data labeling instructions

(from people that actually write them)

1. Set the scene, explain the why

Just like how movies are storyboarded before they are produced, do the same with your instructions. Consider the intro, key points and conclusion.

“Guide the annotator’s thought process of why what they’re looking at matters. What are all the possible decisions they might need to make?”, says Irfan.

Tip: Instructions should guide the annotator's thought process and train them to build judgment in accordance with your quality guidelines.

2. Use simple language

Keep it simple! Sabrina recommends using vocabulary that is instructional and to the point. Irfan also notes that analogies that are incompatible with cultural boundaries typically get misunderstood. For example:

Good use of simple, instructional language

How to spot Vietnamese mossy frogs: Start by looking for their eyes.

Not so good use of language

Finding a Vietnamese mossy frog in a rainforest might feel like looking for a needle in a haystack. A good place to start is by looking for their eyes in photos.

Tip: Remember the tone of voice to use and get straight to the point.

3. Say it with images

Show examples of what is correctly labeled, and what isn’t correctly labeled. For Sabrina, she finds that placing these examples in a side by side structure works best. For example:

Also, remember to provide a visual definition of your labels. This helps the annotators know exactly what to look out for. For example:

4. Visual hierarchy

Data labeling instructions can get pretty long (especially when a lot of edge cases are involved). So it’s important to bold text for points you really want to emphasize. Here’s an example of how we’ve done it:

Other ways to emphasize copy:

Italicize specific text in a sentence
Create subheadings in different font sizes/color
Highlight text in a contrasting color

Remember to use these font variations sparingly. Nothing will look important if everything looks the same for the reader.

5. Include edge cases

We know that finding edge cases within a dataset are tough. You don’t know what you don’t know until, well, someone discovers it. As best you can, include as many examples of edge cases first.

On SUPABOLT, you are notified when an annotator skips a task that isn’t covered in the instructions. Get our annotators to help you surface these edge cases and remember to update the Edge Cases section in your instructions. We highly recommend working in smaller batches so you can discover new image scenarios and teach our team of annotators how to handle them.

Here are a few sample templates for instructions.

Give SUPABOLT a quick spin! Sign up and start a project in under 10 minutes. If you’re looking for a managed service designed and tailored to your data labeling needs, reach out to us hello@supa.so.