Updated
October 2019
Datasets

Annotation tools for building datasets

A list of the best annotation tools for labeling images and text from across the web.

Email me at hello@datasetlist.com with questions, suggestions and ideas.
You can subscribe to get updates when new datasets and tools are released.
Name License
LabelImg is a graphical image annotation tool. It is written in Python and uses Qt for its graphical interface. Annotations are saved as XML files in PASCAL VOC format, the format used by ImageNet
MIT
MIT License - A short and simple permissive license with conditions only requiring preservation of copyright and license notices. Licensed works, modifications, and larger works may be distributed under different terms and without source code.
Labelme is a graphical image annotation tool inspired by http://labelme.csail.mit.edu. It is written in Python and uses Qt for its graphical interface.
GPL
GNU General Public License v3.0 - Permissions of this strong copyleft license are conditioned on making available complete source code of licensed works and modifications, which include larger works using a licensed work, under the same license. Copyright and license notices must be preserved. Contributors provide an express grant of patent rights.
CVAT is completely re-designed and re-implemented version of Video Annotation Tool from Irvine, California tool. It is free, online, interactive video and image annotation tool for computer vision. It is being used by our team to annotate million of objects with different properties. Many UI and UX decisions are based on feedbacks from professional data annotation team.
MIT
MIT License - A short and simple permissive license with conditions only requiring preservation of copyright and license notices. Licensed works, modifications, and larger works may be distributed under different terms and without source code.
doccano is an open source text annotation tool for human. It provides annotation features for text classification, sequence labeling and sequence to sequence. So, you can create labeled data for sentiment analysis, named entity recognition, text summarization and so on. Just create project, upload data and start annotation. You can build a dataset in hours.
MIT
MIT License - A short and simple permissive license with conditions only requiring preservation of copyright and license notices. Licensed works, modifications, and larger works may be distributed under different terms and without source code.
An open source annotation and labeling tool for image and video assets. VoTT is a React + Redux Web application, written in TypeScript. This project was bootstrapped with Create React App. Features include: The ability to label images or video frames Extensible model for importing data from local or cloud storage providers Extensible model for exporting labeled data to local or cloud storage providers VoTT helps facilitate an end-to-end machine learning pipeline.
MIT
MIT License - A short and simple permissive license with conditions only requiring preservation of copyright and license notices. Licensed works, modifications, and larger works may be distributed under different terms and without source code.
NeuroNER is a program that performs named-entity recognition (NER).
Not found
License information not found
brat is a web-based tool for text annotation; that is, for adding notes to existing text documents. brat is designed in particular for structured annotation, where the notes are not freeform text but have a fixed form that can be automatically processed and interpreted by a computer.
MIT
MIT License - A short and simple permissive license with conditions only requiring preservation of copyright and license notices. Licensed works, modifications, and larger works may be distributed under different terms and without source code.
902
A free and open source tool that aims to significantly reduce the time of labeling in object detection projects. No installation is required, all you need is a browser. Make Sense is online, but we care about your privacy and we don't send your photos anywhere.
GPL
GNU General Public License v3.0 - Permissions of this strong copyleft license are conditioned on making available complete source code of licensed works and modifications, which include larger works using a licensed work, under the same license. Copyright and license notices must be preserved. Contributors provide an express grant of patent rights.
855
FIAT enables image data annotation, data augmentation, data extraction, and result visualisation/validation. Annotate images for image classification, optical character reading (digit classification, letter classification), ... Extract data into different format (Caffe LMDB, OpenCV Cascade Classifiers, Tesseract ... ) with data augmentation (resizing, noise in translation / rotation / scaling, pepper noise , gaussian noise, rectangle merging, line extraction ...).
GPL
GNU General Public License v2.0 - The GNU GPL is the most widely used free software license and has a strong copyleft requirement. When distributing derived works, the source code of the work must be made available under the same license. There are multiple variants of the GNU GPL, each with different requirements.
824
Software that allows you to manually and quickly annotate images in directories. The method is pseudo manual because it uses the algorithm watershed marked of OpenCV. The general idea is to manually provide the marker with brushes and then to launch the algorithm.
LGPL
GNU Lesser General Public License v3.0 - Permissions of this copyleft license are conditioned on making available complete source code of licensed works and modifications under the same license or the GNU GPLv3. Copyright and license notices must be preserved. Contributors provide an express grant of patent rights. However, a larger work using the licensed work through interfaces provided by the licensed work may be distributed under different terms and without source code for the larger work.
600
This is the official PyTorch reimplementation of Polygon-RNN++ (CVPR 2018).
Not found
License information not found
566
sloth is a tool for labeling image and video data for computer vision research.
GPL
GNU General Public License v3.0 - Permissions of this strong copyleft license are conditioned on making available complete source code of licensed works and modifications, which include larger works using a licensed work, under the same license. Copyright and license notices must be preserved. Contributors provide an express grant of patent rights.
550
YEDDA (the previous SUTDAnnotator) is developed for annotating chunk/entity/event on text (almost all languages including English, Chinese), symbol and even emoji. It supports shortcut annotation which is extremely efficient to annotate text by hand. The user only need to select text span and press shortcut key, the span will be annotated automatically. It also support command annotation model which annotates multiple entities in batch and support export annotated text into sequence text.
Apache
Apache License 2.0 A permissive license whose main conditions require preservation of copyright and license notices. Contributors provide an express grant of patent rights. Licensed works, modifications, and larger works may be distributed under different terms and without source code.
463
COCO Annotator is a web-based image annotation tool designed for versatility and efficiently label images to create training data for image localization and object detection. It provides many distinct features including the ability to label an image segment (or part of a segment), track object instances, labeling objects with disconnected visible parts, efficiently storing and export annotations in the well-known COCO format. The annotation process is delivered through an intuitive and customizable interface and provides many tools for creating accurate datasets.
MIT
MIT License - A short and simple permissive license with conditions only requiring preservation of copyright and license notices. Licensed works, modifications, and larger works may be distributed under different terms and without source code.
460
Javascript image annotation tool based on image segmentation. Label image regions with mouse. Written in vanilla Javascript, with require.js dependency (packaged). Pure client-side implementation of image segmentation.
BSD
BSD 3-Clause "New" or "Revised" License - A permissive license similar to the BSD 2-Clause License, but with a 3rd clause that prohibits others from using the name of the project or its contributors to promote derived products without written consent.
419
A scalable open-sourced annotation web tool brought by Berkeley DeepDrive. Support various types of annotations on both images and videos Build innovative features and user-friendly interface Improve speed by using semi-automated annotations Support concurrent annotation sessions and progress monitoring Accessible through a web browser without installation
BSD
BSD 3-Clause "New" or "Revised" License - A permissive license similar to the BSD 2-Clause License, but with a 3rd clause that prohibits others from using the name of the project or its contributors to promote derived products without written consent.
301
Anafora (pronounced "a-nuh-FOUR-uh", /ænəˈfɔɹə/) is a new annotation tool written at the University of Colorado by Wei-te Chen and Will Styler. Anafora is designed to be a lightweight, flexible annotation solution which is easy to deploy for large and small projects.
Apache
Apache License 2.0 A permissive license whose main conditions require preservation of copyright and license notices. Contributors provide an express grant of patent rights. Licensed works, modifications, and larger works may be distributed under different terms and without source code.
176
WebAnno is a general purpose web-based annotation tool for a wide range of linguistic annotations including various layers of morphological, syntactical, and semantic annotations.Additionaly, custom annotation layers can be defined, allowing WebAnno to be used also for non-linguistic annotation tasks. WebAnno is a multi-user tool supporting different roles such as annotator, curator, and project manager. The progress and quality of annotation projects can be monitored and measuered in terms of inter-annotator agreement. Multiple annotation projects can be conducted in parallel.
Apache
Apache License 2.0 A permissive license whose main conditions require preservation of copyright and license notices. Contributors provide an express grant of patent rights. Licensed works, modifications, and larger works may be distributed under different terms and without source code.
167
LOST (Label Object and Save Time) is a flexible web-based framework for semi-automatic image annotation. It provides multiple annotation interfaces for fast image annotation. LOST is flexible since it allows to run user defined annotation pipelines where different annotation interfaces/ tools and algorithms can be combined in one process. It is web-based since the whole annotation process is visualized in your browser. You can quickly setup LOST with docker on your local machine or run it on a web server to make an annotation process available to your annotators around the world. LOST allows to organize label trees, to monitor the state of an annotation process and to do annotations inside the browser. LOST was especially designed to model semi-automatic annotation pipelines to speed up the annotation process. Such a semi-automatic can be achieved by using AI generated annotation proposals that are presented to an annotator inside the annotation tool.
MIT
MIT License - A short and simple permissive license with conditions only requiring preservation of copyright and license notices. Licensed works, modifications, and larger works may be distributed under different terms and without source code.
155
Image annotation: Provides a simple GUI for marking bounded boxes of objects in images for training Yolo v3 and v2 Object detection: Built-in image detector, It can automatically annotate the detected objects in the images. Search by tags: You can browse and search tagged images in the tags view Localization: Support English, Simplified Chinese, Traditional Chinese, expandable support for other languages. Private space: A password-protected space where you can hide non-public image sources UWP: Support for Windows Universal Platform (UWP), you can click this link to view it in the windows app store. Due to the development cost of the UWP version, it will not be updated with the desktop version.
GPL
GNU General Public License v2.0 - The GNU GPL is the most widely used free software license and has a strong copyleft requirement. When distributing derived works, the source code of the work must be made available under the same license. There are multiple variants of the GNU GPL, each with different requirements.
142
A semantic annotation platform offering intelligent assistance and knowledge management The annotation of specific semantic phenomena often require compiling task-specific corpora and creating or extending task-specific knowledge bases. Presently, researchers require a broad range of skills and tools to address such semantic annotation tasks.
Apache
Apache License 2.0 - A permissive license whose main conditions require preservation of copyright and license notices. Contributors provide an express grant of patent rights. Licensed works, modifications, and larger works may be distributed under different terms and without source code.
133
PDFAnno is a browser-based linguistic annotation tool for PDF documents. It offers functions for annotating PDF with labels and relations. For natural language processing and machine learning, it is suitable for development of gold-standard data with named entity spans, dependency relations, and coreference chains.
MIT
MIT License - A short and simple permissive license with conditions only requiring preservation of copyright and license notices. Licensed works, modifications, and larger works may be distributed under different terms and without source code.
116
LabelD is a quick and easy-to-use image annotation tool, built for academics, data scientists, and software engineers to enable single track or distributed image tagging. LabelD supports both localized, in-image (multi-)tagging, as well as image categorization.
GPL
GNU Affero General Public License v3.0 - Permissions of this strongest copyleft license are conditioned on making available complete source code of licensed works and modifications, which include larger works using a licensed work, under the same license. Copyright and license notices must be preserved. Contributors provide an express grant of patent rights. When a modified version is used to provide a service over a network, the complete source code of the modified version must be made available.
115
An infinitely customizable image annotation library built on React
MIT
MIT License - A short and simple permissive license with conditions only requiring preservation of copyright and license notices. Licensed works, modifications, and larger works may be distributed under different terms and without source code.
107
This is a collaborative online tool for labeling image data. The Imagetagger is a database with integrated tools to create and manage image data and related labels. It was designed for the RoboCup to create training data for neural networks and evaluation data for diverse object recognition methods. Therefore cooperative labeling of the same data set, flexible further use of the images and labels and the option to share the data had to be made possible.
MIT
MIT License - A short and simple permissive license with conditions only requiring preservation of copyright and license notices. Licensed works, modifications, and larger works may be distributed under different terms and without source code.
103
Image annotation tool by comma.ai
MIT
MIT License - A short and simple permissive license with conditions only requiring preservation of copyright and license notices. Licensed works, modifications, and larger works may be distributed under different terms and without source code.
95
VGG Image Annotator is a simple and standalone manual annotation software for image, audio and video. VIA runs in a web browser and does not require any installation or setup. The complete VIA software fits in a single self-contained HTML page of size less than 400 Kilobyte that runs as an offline application in most modern web browsers. VIA is an open source project based solely on HTML, Javascript and CSS (no dependency on external libraries). VIA is developed at the Visual Geometry Group (VGG) and released under the BSD-2 clause license which allows it to be useful for both academic projects and commercial applications.
BSD
BSD 2-Clause “Simplified” License - A permissive license that comes in two variants, the BSD 2-Clause and BSD 3-Clause. Both have very minute differences to the MIT license.
94
A lightweight web-based tool for annotating word sequences.
Research
Research and Academic Use License
93
It is a tool used to annotate 3D box in point cloud. Point cloud in KITTI-bin format is supported. Annotation format is the same as Applo 3D format.
MIT
MIT License - A short and simple permissive license with conditions only requiring preservation of copyright and license notices. Licensed works, modifications, and larger works may be distributed under different terms and without source code.
92
Fast and efficient BBox annotation for your images in YOLO, and now, VOC/COCO formats!
MIT
MIT License - A short and simple permissive license with conditions only requiring preservation of copyright and license notices. Licensed works, modifications, and larger works may be distributed under different terms and without source code.
81
MUltiple VIdeos LABelling tool is a manual annotation tool to help you labelling videos for computer vision, machine learning, deep learning and AI applications. With MuViLab you can annotate hours of videos in just a few minutes!
Non-commercial
MuViLab is freely available for free non-commercial use, and may be redistributed under these conditions
80
SMART is an open source application designed to help data scientists and research teams efficiently build labeled training datasets for supervised machine learning tasks.
MIT
MIT License - A short and simple permissive license with conditions only requiring preservation of copyright and license notices. Licensed works, modifications, and larger works may be distributed under different terms and without source code.
62
An Amazon Mechanical Turk turn-key segment tool. Turkey lets you easily create a web UI on Amazon Mechanical Turk to crowd-source image annotation data. Its main functions include: Customize the annotation modes and class labels on per-image basis, Import previous annotations generated by either another human or an algorithm, Zoom-in, zoom-out, delete, undo, reset.
MIT
MIT License - A short and simple permissive license with conditions only requiring preservation of copyright and license notices. Licensed works, modifications, and larger works may be distributed under different terms and without source code.
45
MAE
MAE is a lightweight, general-purpose natural language annotation tool
GPL
GNU General Public License v3.0 - Permissions of this strong copyleft license are conditioned on making available complete source code of licensed works and modifications, which include larger works using a licensed work, under the same license. Copyright and license notices must be preserved. Contributors provide an express grant of patent rights.
34
DeepLabel is a cross-platform tool for annotating images with labelled bounding boxes. A typical use-case for the program is labelling ground truth data for object-detection machine learning applications. DeepLabel runs as a standalone app and compiles on Windows, Linux and Mac.
MIT
MIT License - A short and simple permissive license with conditions only requiring preservation of copyright and license notices. Licensed works, modifications, and larger works may be distributed under different terms and without source code.
33
Open-source rich internet application for collaborative analysis of multi-gigapixel images
Apache
Apache License 2.0 A permissive license whose main conditions require preservation of copyright and license notices. Contributors provide an express grant of patent rights. Licensed works, modifications, and larger works may be distributed under different terms and without source code.
22
I have written a brutally simple MATLAB tool for annotating images with polygons which I wanted to put online long ago. I have written this tool as at the time I haven't found a cross-platform software for polygon annotation which is flexible, easy to configure and can output semantic and instance label maps.
GPL
GNU General Public License v3.0 - Permissions of this strong copyleft license are conditioned on making available complete source code of licensed works and modifications, which include larger works using a licensed work, under the same license. Copyright and license notices must be preserved. Contributors provide an express grant of patent rights.
Labelbox is collaborative training data software for computer vision teams.
Commercial
See website for details
Prodigy is a machine teaching tool so efficient that a single data scientist can create end-to-end prototypes for new funtionality without commissioning external annotations, and with a smooth path to production. Whether you're working on entity recognition, intent detection or image classification, Prodigy can help you train and evaluate your models faster.
Commercial
See website for details
An image annotation tool to label images for bounding box object detection and segmentation.
Commercial
See website for details
tagtog is the place to find NLP datasets. Not there yet? Create it yourself, and share it with your team and the whole world. tagtog is intuitive for you the non-tech saavy, and fully geared for you the ML troublemakers.
Commercial
See website for details
Add an annotation tool
You can subscribe to get updates when new datasets and tools are released.
© 2019 Nikola Plesa | Privacy | Datasets | Annotation tools
hello@datasetlist.com