Detectron2 and unusual demands of custom data format. Bygone

Labellerr - Automated SAAS Training Data Platform

7 min readApr 5, 2021

“Detectron2 is Facebook AI Research’s next-generation software system that implements state-of-the-art object detection algorithms. It is a ground-up rewrite of the previous version, Detectron, and it originates from maskrcnn-benchmark. Detectron2 was built by Facebook AI Research (FAIR) to support the rapid implementation and evaluation of novel computer vision research”.

When we talk of computer vision and its ability to understand the visual world. It crosses our mind what specific tasks can it comprehend.

Let’s take an example of a person entering a retail store with a shopping list in hand. He not only needs to recognize the product but also differentiate it from the ones kept alongside it.

Say, he needs to buy a jar of coffee. The coffee jars may be placed alongside tea jars or powdered chocolate jars. The person needs to identify the coffee jars among their peers alongside an attempt to recognize only the jars and not the racks or background information is to be made.

But, hey our brain is able to analyze the visual and in a matter of milliseconds we can easily comprehend this information. But can machines do this? And when we talk of automation and its healthy effects on revenue, as mentioned in the Mordor Intelligence report, “The retail automation market is expected to grow at a CAGR of 15.41% over the forecast period (2021–2026)”. We depend highly on solutions that are capable of recognizing the surroundings and take appropriate actions with precision.

Thankfully the recent advances in computer vision have made this possible. We are able to train algorithms to detect objects, determine their shape, and localize them on a canvas. Utilize the trained models in a variety of use-cases to drive automation and boost sales enjoying the growth in revenue.

But What is Computer Vision?

Computer Vision is a subfield of Artificial Intelligence, where computers are trained to interpret and understand the visual world, by using digital images and videos from cameras and training deep learning algorithms on it.

The algorithms can then be used to architect models capable of identifying, localizing, classifying and then reacting to what they see. The models can be deployed either on edge devices, on-prem or on-cloud to generate insights by processing images and videos. The insights can then be leveraged to make strategic and operational decisions. And keeping up with the competitors who would soon be adopting Computer Vision. As highlighted by a Grand view research report, “The global computer vision market size was valued at USD 10.6 billion in 2019 and is expected to grow at a compound annual growth rate (CAGR) of 7.6% from 2020 to 2027”.

The same report forecasts revenue of USD 19.1 billion by 2027 of the computer vision market.

AI-based Computer Vision in Retail aims to automate the working of a human visual system and in some cases aid it with better and detailed insights.

Let’s dive deeper into the types of computer vision problems:

When the person entered the retail store, he knew he wanted a coffee jar.

Now among all the jars in his visual, he needs to identify the jar of coffee. This task is referred to as image classification. Wherein the person tries to classify all the jars in some classes like tea, coffee, chocolate etc.

Once classified, the next task awaiting him is identifying the exact location of the coffee jar on the rack among its peers. This task is called Object localization wherein we generally draw a (jargon warning) “bounding box” around the object to localize it. Object localization along with object classification is termed Object detection.

Hurray! We understand Computer Vision now. But is that enough?

Before detecting the object say even before classifying it. We need to know what the object consists of at the granular level. In image terms, say pixel level. The person needs to understand what a coffee jar looks like.

For this, let’s see what is an image, in actuality when we talk about it.

An image is just a collection of different pixels. Every pixel has some attributes. A group of pixels with similar attributes make up individual objects.

We make use of this knowledge about the structure of an image to identify the class of each pixel in the image. In our case, the rack, the coffee jar, the other jars, the background, etc.

This is known as segmentation.

Image Segmentation finds its use in a variety of fields like autonomous vehicles, healthcare where organ/ bacteria shape identification with precision may sometimes be a life or death question etc.

Practical Implementation

A lot has been said about the terminologies, without implementation terminologies have no meaning. With our research, we found this exciting platform from Facebook AI research named Detectron2.

Detectron2 is Facebook AI Research’s next-generation software system that implements state-of-the-art object detection algorithms.

Hear what Analytics India Magazine has to say about Detectron2.

Getting yourself accustomed to Detectron2 will lay a path of seamless Computer Vision implementations. With Detectron2Go deployment on edge devices just got easier. Look what Wan-Yen Lo Research Engineering Manager at FAIR has to share.

You can follow along this notebook to get an idea of the platform and get your hands dirty with easy-to-implement modules to perform image segmentation tasks. Leverage labellerr

But is the daunting task of labelling your data stopping you from exploring this exciting platform.

Wait Labellerr is your solution.

Labellerr is a data-annotation platform that provides simple, clear and easy to use UI with seamless UX to perform annotation on different types of data.

Leverage the auto label feature on labellerr to annotate your data with 10x speed and save crucial man-hours.

Follow along with this video to get an idea of how to perform segmentation annotation on Labellerr.

Detectron2 export now on Labellerr

But you know what, your journey to the perfect segmentation model still has a bottleneck around. Detectron2 though being a state of art has this requirement of data being in a specific format, similar to the COCO data format.

Well, what does this imply?

It implies even after annotating your data and getting a JSON export from your favorite labeling tool you still have to write a custom script to convert this export to the likings of Detectron2.

Hmm yes, again a script, more coding required, testing, and wait bugs always come un-notified. And a lot of crucial man-hours wasted in the process. Leave alone the frustration of still not being able to implement the solution.

Wait, Isn’t the work of a data scientist training algorithms and monitoring their metrics.

Well with labellerr to your rescue you need to worry only about your favorite work at the office and leave the hustles of data export format on us.

Labellerr now has Detectron2 format already builtin in the export section. Leveraging which you get the data labeled and exported to be directly plugged into the detectron2 modules on the go with just one click.

Give it a try. Experience the joy of training your own Segmentation model without the hustle of code implementations :

Get along with this notebook to plug your exported data and train a detectron2 segmentation model to get the prediction of segmentation on the go.

Follow along this video to get an idea of how to perform segmentation annotation on labellerr.

Have any other use case in mind. Visit our website and mention your use case in brief and our customer engineer will contact you and help you prepare the plan and get you running on a trial with us to validate.

Detectron2 and unusual demands of custom data format. Bygone

But What is Computer Vision?

Practical Implementation

Detectron2 export now on Labellerr

Well, what does this imply?

Written by Labellerr - Automated SAAS Training Data Platform

No responses yet