Data labeling instructions can be quite tedious and on occasion, tricky.
But when instructions are done right, the results are obvious: Annotations are clearly drawn. The correct labels are used. Edge cases are handled with care. The link is clear. Good labeling instructions equals better quality annotations.
But what exactly does good mean? We spoke to our SUPA analysts Irfan and Sabrina (our Enterprise team experts) to give us a few tips on writing good quality instructions.
Together, Irfan and Sabrina have successfully managed projects from sports and manufacturing, to augmented reality and construction – phew! They know their stuff and we’re here to share their wisdom with you 😉
Just like how movies are storyboarded before they are produced, do the same with your instructions. Consider the intro, key points and conclusion.
“Guide the annotator’s thought process of why what they’re looking at matters. What are all the possible decisions they might need to make?”, says Irfan.
Tip: Instructions should guide the annotator's thought process and train them to build judgment in accordance with your quality guidelines.
Keep it simple! Sabrina recommends using vocabulary that is instructional and to the point. Irfan also notes that analogies that are incompatible with cultural boundaries typically get misunderstood. For example:
How to spot Vietnamese mossy frogs: Start by looking for their eyes.
Finding a Vietnamese mossy frog in a rainforest might feel like looking for a needle in a haystack. A good place to start is by looking for their eyes in photos.
Tip: Remember the tone of voice to use and get straight to the point.
Show examples of what is correctly labeled, and what isn’t correctly labeled. For Sabrina, she finds that placing these examples in a side by side structure works best. For example:
Also, remember to provide a visual definition of your labels. This helps the annotators know exactly what to look out for. For example:
Data labeling instructions can get pretty long (especially when a lot of edge cases are involved). So it’s important to bold text for points you really want to emphasize. Here’s an example of how we’ve done it:
Other ways to emphasize copy:
Remember to use these font variations sparingly. Nothing will look important if everything looks the same for the reader.
We know that finding edge cases within a dataset are tough. You don’t know what you don’t know until, well, someone discovers it. As best you can, include as many examples of edge cases first.
On SUPABOLT, you are notified when an annotator skips a task that isn’t covered in the instructions. Get our annotators to help you surface these edge cases and remember to update the Edge Cases section in your instructions. We highly recommend working in smaller batches so you can discover new image scenarios and teach our team of annotators how to handle them.
Here are a few sample templates for instructions.
Everything from uploading data to seeing it labeled in real time was really cool. This is just way simpler to use compared to Amazon Sagemaker and LabelBox. I was also very impressed with how the platform delivered exactly what we needed in terms of label quality.
I was also able to view the labels as they were being generated, which gave me quick feedback about the label quality, rather than waiting for the whole batch. This replaced my standard manual QA process using external tools like Voxel's Fiftyone, as the labels were clear and easy to parse through in real-time.