Introducing UBIAI: Easy-to-Use Text Annotation for NLP Applications

Walid Amamou
Chatbots Life
Published in
6 min readSep 5, 2020

--

Whether it’s entity recognition, chatbot training, entity sentiment analysis or text classification, etc., annotating text to train and fine tune a model for your own use is crucial. Therefore, choosing the right annotation tool with low UI friction and maximum automation is of the utmost importance.

Today, we introduce a new text annotation tool UBIAI that offers easy-to-use UI, multilingual support (including Arabic and Chinese) complemented with auto-annotation functionality. The tool is currently free in the beta version.

UBIAI Intro

Multilingual Support

UBIAI supports multiple language annotation with specific tokenization for each language, for example Arabic tokenization is different than Chinese which is different than English.

Annotated English Document
Annotated Arabic Document
Chinese Document

When creating a new project, simply specify the language and upload your documents. Your documents will then be automatically tokenized depending on the chosen language.

Multiple Upload Format

Only very few annotation tools offer the flexibility to upload documents in different formats. UBIAI offers multiple upload formats:

  1. TXT,PDF, HTML and DOCX
  2. JSON: you can upload a JSON file with existing entities. This is useful if you have a pre-annotated JSON file that you would like to import to continue the annotation
  3. CSV: you can upload a csv file containing one document per row. This is useful to upload documents in bulk
  4. ZIP: you can upload a zip file containing TXT, PDF or HTML. This is useful to upload documents in bulk
Multiple Upload Formats

Intuitive UI

The annotation interface is the core of any annotation tool as it is where the annotator spends the majority of his/her time. Having a seamless, easy-to-use, low friction interface is a must.

UBIAI Annotation Interface

UBIAI provides a sleek interface with real time auto-saving during annotation. In addition, with auto-detection enabled, the tool will search and annotate similar words as soon as you highlight a specific word.

Trending Bot Articles:

1. 8 Proven Ways to Use Chatbots for Marketing (with Real Examples)

2. How to Use Texthero to Prepare a Text-based Dataset for Your NLP Project

3. 5 Top Tips For Human-Centred Chatbot Design

4. Chatbot Conference Online

Pre-Annotation:

Dictionary
For each entity type you can associate one or more dictionary to automatically recognize and annotate words contained in said dictionary. You can either input the dictionary element manually or upload a csv list containing all the associated words with their corresponding entity type (see example below):

CSV Dictionary

Rule Based Matching:

With rule based matching you will be able to pre-annotate your documents instantly using a combination of multiple tags such as Part Of Speech (POS), regular expressions, patterns (email, number, phone number, etc…). The list of all the possible attributes with their description can be found in the documentation.

Machine Learning Auto-Annotation:

In order to speed up the annotation process, UBAI offers the ability to auto-annotate your documents using a spaCy model. All you have to do is:

  1. Select project from which the training corpus will be used
  2. Select a pre-trained model, you have the option to start from a blank or a pre-trained English model en_core_web_sm.
  3. Select the training/evaluation partition from the annotated data to train/evaluate the model.
  4. Configure the training by specifying the number of iterations (default is 10), dropout and batch size.
  5. You have the option to auto-annotate your document after the model finishes training by checking the “Annotate Your Documents after Finish Train” button. Note: For efficient model training, it is recommended to annotate at least 10% of your total documents.

After training, UBIAI will directly evaluate the model based on the train/validation partition. The precision, recall, and F score for each entity will be displayed:

To track model performance over time, press on the “view log” button below the model name.

Multiple Export Formats:

The main limitation of the existing annotation tools is the limited amount of annotation exports. With UBIAI, you have the option to export your annotation to the following formats:

  1. Amazon Comprehend format (see tutorial here)
  2. JSON format
  3. SpaCy format
  4. Stanford CoreNLP format
  5. IOB format including IOB Part Of Speech (POS) and IOB Chatbot
  6. Stanford CoreNLP format

A zip file containing the annotation along with the documents used during annotation will be downloaded; you will need to unzip the file before using the annotation to train a model. Below is an example of the annotation export using the IOB POS:

IOB POS Export

Real Time Analysis:

With real time analysis, you will be able to test your trained model on the spot without leaving the tool. This is useful to quickly check the performance of the model on real production text.

Real Time Analysis Entity Extraction

Team Collaboration:

It is needless to say that team collaboration is essential to not only speed up the annotation process but it also mitigates annotator bias by leveraging group annotation to infer the underlying truth.

UBIAI offers the option to collaborate with team members easily by creating collaborations for your projects.

Collaboration Platform

Final Note

Our mission at UBIAI is to make easy-to-use Natural Language Processing (NLP) tools to help developers and companies try out machine learning ideas quickly and apply them to real world problems without wasting time in coding. We believe that providing accessible and low cost tools will democratize NLP across the globe and allow for better and intelligent decision making. To this effect, we are committed to offering the tool with full features at a significant discount to researchers and people in academia.

We are constantly improving the tool and in need of beta testers. Please give it a try at https://ubiai.tools and give us your feedback!

Follow us on Twitter @UBIAI5

Don’t forget to give us your 👏 !

--

--

Founder of UBIAI, annotation tool for NLP applications| PhD in Physics.