Data annotation
Data annotation is all about labeling or tagging relevant information/metadata in a dataset to let machines understand what they are.[1] The dataset could be in any form i.e., an image, an audio file, video footage, or even text.
Data is one of the three key elements in the development of artificial intelligence. The development of a computer recognition engine requires massive training data, which need to be annotated. These data include images, sounds, text, etc. Annotation of these types of data includes classification, frame, annotation, marking, etc.
Types of data annotations in computer vision
Computer vision is one of the most debated topics in the current AI landscape. On your phone (something like facial recognition or automated classification of images), in manufacturing settings (emotion recognition for remote education), and in public settings (emotion recognition for remote education).[2]
In order to understand why computer vision algorithms are so popular, let’s look at the types of annotation tasks that enable them to do so.
Image Categorization
In machine learning, this type of task is also known as image classification. By categorizing images, ML algorithms can be trained to group them into predefined categories. These classes will make it possible for a machine-learning model to identify what is detected in a photo after training.
It can, for instance, recognize the difference between a Georgian and a rococo armchair if it is trained to recognize different furniture styles.
Semantic Segmentation
An image can be viewed as a separate entity by a human. Using semantic segmentation, a machine can be trained to associate each pixel of an image with a class of objects (e.g., trees, vehicles, humans, buildings, highways, and sky). Once that’s done, the machine learning model clusters similar pixels together.
The machine then trains to “see” the separate objects on the picture using the map created by clustering different object classes.
2D boxes
Data annotations of this type also go by the name bounding boxes. These annotations illustrate objects in 2D by drawing boxes around them on an image. For example, cars, people, household objects, cars are used in autonomous driving algorithms, etc.
Using similar parameters, the machine then classes objects into predefined categories (cars, people, household items, etc.).
3D Cuboids
The initial frame around an object can be enhanced with 3D cuboids similar to 2D boxes. A two-dimensional image can be given an in-depth perspective with this type of data annotation.
An image with the third dimension also shows rotation, relative position, and predictions of movement in addition to size and position in space.
Polygonal Annotation
As frames cannot capture the shape of an object quickly, they are not sufficient for training an ML algorithm. For objects with complicated outlines (such as curvy or multi-sided), polygonal annotation can be used.
Objects and their positions in space can be recognized by machines based on their polygonal shapes. Therefore, you can explain what a lamp or vase is or what type of object they are to your model in the interior design project.
Keypoint Annotation
There is a type of annotation that aims directly at this, despite artificial objects being simpler to explain to algorithms than natural objects. In keypoint annotations, the main (key) points of a natural object are defined so that a machine learning algorithm can predict the shape and movement of the object.
People and animals use keypoint annotations for a variety of purposes, including facial and emotion recognition, tracking their movements (such as in sports apps or exercise programs), etc. The versatility of this type of annotation allows it to also be used to track machine-made objects’ positions.
References
[change | change source]- ↑ Naminas, Karyna (August 5, 2024). "Data Annotation: Your Guide to Efficient In-House Workflows". Label Your Data. Retrieved 2024-09-11.
- ↑ Anolytics (2023-09-12). "The Complete Guide to Data Annotation". Anolytics. Retrieved 2025-02-24.