Enhancing a YOLO Algorithm for Accurate Prediction of Over 600 Custom Classes

11 min readMay 11, 2023

Hello everyone! In this tutorial, we will demonstrate how to train a YOLO (You Only Look Once) algorithm for object detection with custom classes, specifically from a list of over 600 classes. The YOLO algorithm is known for its speed and accuracy, making it ideal for real-time object detection tasks. By the end of this tutorial, you will have a working model that can predict custom classes from a large dataset; in this tutorial we will focus on building a food detection system where we will detect tacos, shrimps and vegetables from images and videos.

Here’s an outline of the steps we’ll follow:

Gather and prepare the dataset
Configure YOLO for custom classes
Train the YOLO model
Test and evaluate the model

I. Gather and prepare the dataset

The dataset.

Google Open Images Dataset v6 is a large-scale, diverse dataset for visual recognition, containing over 9 million images with annotations. These annotations encompass over 600 object categories and span various applications such as object detection, visual relationship detection, and instance segmentation.

OIDv6 is organized into different subsets: training, validation, and testing. The annotations include over 16 million object bounding boxes, 3 million instance segmentations, and 4 million visual relationships. Furthermore, the dataset is hierarchically structured with the help of a semantic ontology that comprises more than 1,700 classes, allowing for a more comprehensive understanding of the relationships between various objects.

Researchers and developers commonly use the Google Open Images Dataset for training and benchmarking computer vision models. It has become a popular resource for machine learning projects in various fields, such as self-driving cars, robotics, and automated content moderation.

Building our dataset.

We will use OIDv6 package. This is a Python package available on the Python Package Index (PyPI). This package is an easy-to-use toolkit designed to assist users in working with the Google Open Images Dataset.

The OIDv6 package allows users to:

Download images and annotations for specific classes or groups of classes from the Google Open Images Dataset.
Filter images according to the presence or absence of particular classes.
Convert the dataset format to work with popular deep learning frameworks such as TensorFlow and PyTorch.
Inspect and visualize the dataset and annotations with built-in tools.

To install the OIDv6 package, you can use the following pip command:

pip install oidv6

Once installed, you can access its features through a command-line interface or by importing the package into your Python script. The package documentation and examples will provide further guidance on how to use the toolkit to manage and manipulate the Google Open Images Dataset according to your requirements.

Dataset downloading process

Initially, we must generate a text file entitled “classes.txt”, within which the designations of the classes of interest shall be documented. For the purposes of this instructional guide, we shall focus on three distinct classes: Tacos, Shrimps, and Vegetables.
With our .txt file created, run the following commands to download the dataset that will be usd to train our model (The command provided initiates the Open Images Dataset V6 (OIDv6) downloader tool, which is designed to fetch and download specific images from the comprehensive Open Images Dataset. The parameters supplied within the command are as follows: en: This signifies that the language of the dataset is English. type_data train: This indicates that the data type being downloaded is for training purposes. classes ./classes.txt: This parameter directs the downloader to refer to the “classes.txt” file located in the current directory to acquire the list of desired classes (Tacos, Shrimps, and Vegetables, in this case). limit 100: This constraint specifies that a maximum of 100 images per class should be downloaded. multi_classes: This flag allows the downloader to obtain images belonging to multiple classes simultaneously, rather than downloading images from each class individually.):

oidv6 downloader en - type_data train - classes ./classes.txt - limit 100 - multi_classes

3. We run the same command for validation and test datasets:

oidv6 downloader en - type_data validation - classes ./classes.txt - limit 50 - multi_classes

oidv6 downloader en - type_data test - classes ./classes.txt - limit 10 - multi_classes

II. Configure YOLO for custom classes

YOLO requieres absolults routes for training so we must develop a code to perform the following tasks:

Convert object detection labels from the Open Images Dataset V6 (OIDv6) format to the YOLO (You Only Look Once) format.
Generate file lists for training, validation, and test data.
Create an object file for YOLO containing necessary configuration details.

The code begins by importing necessary libraries and defining global constants and variables. Next, it defines several functions, including print_msg for conditional printing, get_classes to extract class names from the "classes.txt" file, and label_contents to convert OIDv6 labels to the YOLO format.

The main part of the script is divided into three sections:

Translate Labels to YOLO format: This section iterates through the directories for training, validation, and test data, and for each image file, it generates a corresponding YOLO-formatted label file. The script also generates a text file for each directory, containing a list of image files.
Generate File Lists: This section creates a file list for each directory (train, validation, and test) containing the paths to all image files within the respective directory.
Generate Object File: This section creates a “classes.txt” file and an “obj.data” file containing the number of classes, paths to the training and validation file lists, and the “classes.txt” file location. It also specifies a backup directory.

from os import chdir, path, listdir, getcwd
import shutil
import cv2
import sys

DIRS = ["train", "validation", "test"]
DEBUG = True

SKIP_TRANSLATE_LABELS = False
SKIP_GENERATE_FILE_LISTS = False
SKIP_GENERATE_OBJ_FILE = False

if len(sys.argv) < 2:
    print("Missing classes file as argument")
    raise SystemExit

classes_file = path.realpath(sys.argv[1])

def print_msg(msg, isDebug=False):
    if not isDebug:
        print(msg)
    elif isDebug and DEBUG:
        print(msg)

def get_classes(classes_file):
    with open(classes_file) as f:
        return [l.strip().lower().replace(" ", "_") for l in f.readlines()]

def label_contents(img_filename, classes):
    # It assumes is already in the img_filename directory and that label file
    # is on "labels" directory
    img = cv2.imread(img_filename)
    height, width, _ = img.shape
    file_class = "_".join(path.basename(img_filename).split("_")[:-1])
    class_idx = classes.index(file_class)
    # OIDv6 Label data
    label_file = img_filename[:-4] + ".txt"
    label_lines = [line.strip() for line in open("labels/" + label_file, "r")]
    new_lines = []
    for label_line in label_lines:
        label, x1, y1, x2, y2 = label_line.split()
        x1, y1, x2, y2 = float(x1), float(y1), float(x2), float(y2)
        box_width = x2 - x1
        box_height = y2 - y1
        center_x = x1 + (box_width / 2.0)
        center_y = y1 + (box_height / 2.0)
        relative_cx = center_x / width
        relative_cy = center_y / height
        relative_bw = box_width / width
        relative_bh = box_height / height
        new_lines.append('{0} {1} {2} {3} {4}'.format(
            class_idx, relative_cx, relative_cy, relative_bw, relative_bh))
    return "\n".join(new_lines)


chdir(path.join("OIDv6", "multidata"))

######## Translate Labels to YOLO format  ########

if not SKIP_TRANSLATE_LABELS:
    classes = get_classes(classes_file)

    for DIR in DIRS:
        chdir(DIR)
        image_files = []
        for filename in listdir():
            labels_file = path.join(getcwd(), filename[:-4] + ".txt")
            if (    path.isfile(filename)
                    and filename.endswith(".jpg")
                    and not path.isfile(labels_file) ):
                image_files.append(filename)
                labels_file_contents = label_contents(filename, classes)
                display_filename = DIR + "/" + path.basename(labels_file)
                print_msg("Generating Labels File " + display_filename)
                with open(labels_file, "w") as f:
                    f.write(labels_file_contents + "\n")
        class_list_file = path.join("..", DIR + ".txt")
        with open(class_list_file, "w") as f:
            for image in image_files:
                f.write(f"{DIR}/{image}\n")
        chdir("..")
    print_msg("\n\n================= Label Translation Finished =================\n\n")

if not SKIP_GENERATE_FILE_LISTS:
    for DIR in DIRS:
        chdir(DIR)
        file_list = path.join("..", DIR + ".txt")
        with open(file_list, "w") as f:
            for filename in listdir(getcwd()):
                if filename.endswith(".jpg"):
                    f.write(path.join(DIR, filename) + "\n")
        print_msg(f"File List {DIR}.txt generated")
        chdir("..")
    print_msg("\n\n================= File Lists Generation Finished =================\n\n")

if not SKIP_GENERATE_OBJ_FILE:
    chdir("..")
    classes_file = shutil.copy(classes_file, path.join(getcwd(), "classes.txt"))
    num_classes = sum(1 for line in open(classes_file))
    with open(path.join(getcwd(), "obj.data"), "w") as f:
        f.write(f"classes={num_classes}\n")
        f.write(f"train=multidata/train.txt\n")
        f.write(f"valid=multidata/validation.txt\n")
        f.write(f"names={classes_file}\n")
        f.write(f"backup=./\n")
    print_msg("\n\n================= Object File Generation Finished =================\n\n")

Let´s create a file named yolo_preprocess_data.py at the level of the OIDV folder generated and run the previous cod using the classes.txt as an argument:

python yolo_preprocess_data.py ./classes.txt

When the script ends running we will see this message:

================= Object File Generation Finished =================

Within the OIDv6 we will have the following files:

classes.txt, multidata(folder) and obj.data(change name to objects.txt), zip this files together and name the zipped file data.zip.

III. Train the YOLO model

You will have to use a GPU, in my case I use google colab and google drive to upload the data.zip.

Run the following code blocks:

A. ENV variables definitions (define the model to run, in my case I will use a yolov4-tiny and created a folder Training/Tacos in my google drive)

GOOGLE_COLAB_ENV = True
BACKUP_DIR = "Training/Tacos" # Make sure that your backup Directory exists
MODEL_TO_TRAIN = "yolov4-tiny" # (Only supported options: yolov4 or yolov4-tiny)

G_DRIVE_MOUNTPOINT = "/drive"
G_DRIVE_ROOT = G_DRIVE_MOUNTPOINT + "/MyDrive"
G_DRIVE_DATASETZIP = G_DRIVE_ROOT + "/Training/Data/dataset.zip"

from os import path, getcwd
if GOOGLE_COLAB_ENV:
    CONTENT = "/content"
    DATASET = CONTENT + "/multidata"
    SCRIPTS = CONTENT + "/yolov4-training-with-oidv6"
    DARKNET = CONTENT + "/darknet"
    BACKUP_DIR = G_DRIVE_ROOT + "/" + BACKUP_DIR
else:
    CONTENT = path.realpath(getcwd())
    DATASET = CONTENT + "/multidata"
    SCRIPTS = CONTENT
    DARKNET = CONTENT + "/../darknet"
    BACKUP_DIR = CONTENT + "/" + BACKUP_DIR

CFG_FILE = ""
PRE_TRAINED_WEIGHTS = ""
PTW_FILENAME = ""
CUSTOM_CFG_FILE = SCRIPTS + "/my-" + MODEL_TO_TRAIN + ".cfg"

if MODEL_TO_TRAIN == "yolov4":
    CFG_FILE = DARKNET + "/cfg/yolov4-custom.cfg"
    PRE_TRAINED_WEIGHTS_URL = "https://github.com/AlexeyAB/darknet/releases/download/darknet_yolo_v3_optimal/yolov4.conv.137"
    PTW_FILENAME = "yolov4.conv.137"
elif MODEL_TO_TRAIN == "yolov4-tiny":
    CFG_FILE = DARKNET + "/cfg/yolov4-tiny-custom.cfg"
    PRE_TRAINED_WEIGHTS_URL = "https://github.com/AlexeyAB/darknet/releases/download/darknet_yolo_v4_pre/yolov4-tiny.conv.29"
    PTW_FILENAME = "yolov4-tiny.conv.29"

# You may edit this to point to your last weights to resume training
# PRE_TRAINED_WEIGHTS = "$BACKUP_DIR/my-yolov4-tiny_last.weights"
PRE_TRAINED_WEIGHTS = DARKNET + "/" + PTW_FILENAME

B. Providing the data

# Only Colab's
from google.colab import drive
drive.mount(G_DRIVE_MOUNTPOINT)

!unzip '/drive/MyDrive/Colab Notebooks/yolo_oid_data/tacos/data.zip' -d "$CONTENT"
print("Dataset unzipped into " + CONTENT)

!mkdir -p "$BACKUP_DIR"
print("Backup Directory created at " + BACKUP_DIR)

C. Importing and configuring darknet dependencies

!git clone --depth 1 https://github.com/AlexeyAB/darknet
%cd darknet
!sed -i 's/OPENCV=0/OPENCV=1/' Makefile
!sed -i 's/GPU=0/GPU=1/' Makefile
!sed -i 's/CUDNN=0/CUDNN=1/' Makefile
!sed -i 's/CUDNN_HALF=0/CUDNN_HALF=1/' Makefile
!make

D. CONFIGURATION OF THE OBJECTS.TXT FILE

# Set current valid absolute paths for the dataset information
escaped_content = (CONTENT + "/").replace("/", "\/")
escaped_bdir = BACKUP_DIR.replace("/", "\/")

!sed -i "s/train=/train=$escaped_content/" "$CONTENT"/objects.txt
!sed -i "s/valid=/valid=$escaped_content/" "$CONTENT"/objects.txt
!sed -i "s/names=/names=$escaped_content/" "$CONTENT"/objects.txt
!sed -i "s/backup=\.\//backup=$escaped_bdir/" "$CONTENT"/objects.txt

# Change to absolute paths the train/validation/test file lists
escaped_dataset = DATASET.replace("/", "\/")

!sed -i "s/^train/$escaped_dataset\/train/g" "$DATASET"/train.txt
!sed -i "s/^validation/$escaped_dataset\/validation/g" "$DATASET"/validation.txt
!sed -i "s/^test/$escaped_dataset\/test/g" "$DATASET"/test.txt

E. .CFG FILES

In order to train the Darknet’s YOLO model, it is necessary to procure a configuration (.cfg) file and a .weights file. The .cfg file must be customized to accommodate the specific number of classes intended for prediction. Please modify either darknet/cfg/yolov4-tiny-custom.cfg or darknet/cfg/yolov4-custom.cfg according to your preferences, and subsequently rename the files as my-yolov4-tiny.cfg or my-yolov4.cfg, respectively. To implement these changes, you may opt to create an automated script or execute the modifications manually (in this tutorial I will use yolov4-tiny-custom.cfg ):

[net]
max_batches = (# of Classes * 2000)
steps = (80% of max_batches), (90% of max_batches)

#### Last section of the configuration file
###### Three pairs if yolov4-custom, two pairs if yolov4-tiny-custom[convolutional]
filters = ( (# of Classes + 5) * 3 )
[yolo]
classes = # of Classes[convolutional]
filters = ( (# of Classes + 5) * 3 )
[yolo]
classes = # of Classes[convolutional]
filters = ( (# of Classes + 5) * 3 )
[yolo]
classes = # of Classes

Upload the modified .cfg file to your environment and run following code :

CUSTOM_CFG_FILE = '/drive/MyDrive/Colab Notebooks/yolo_oid_data/Tacos/yolov4-tiny-custom.cfg'

Lets verify that we have all the files we need:

print(f"./darknet detector train {CONTENT}/objects.txt {CUSTOM_CFG_FILE} {PRE_TRAINED_WEIGHTS} -dont-show -mjpeg_port 8090 -map")

F. Train Model

Execute the subsequent code to initiate the model training process. Please note that the duration of this procedure will vary, contingent upon the specific model utilized and the volume of data involved:

!./darknet detector train \
  "$CONTENT"/objects.txt \
  "$CUSTOM_CFG_FILE" \
  "$PRE_TRAINED_WEIGHTS" \
  -dont_show \
  -map

IV. Test and evaluate the model

G. Test Model

To test the model we will develop a a Python script that uses the OpenCV library to process images and perform object detection using the YOLOv4-tiny model. The script detects three object classes: Taco, Shrimp, and Vegetable.

The script imports the necessary libraries: os, glob, cv2, and numpy.
The load_images function takes a path as input and returns a list of image file paths with the .jpg extension.
The process_images function processes the list of image files, using the supplied YOLOv4-tiny configuration file (cfg_file) and weights file (weights_file). It reads each image and performs object detection using the YOLOv4-tiny model. The detected objects are drawn as bounding boxes on the image, along with their class names and confidence scores. The modified images are then saved in the specified output folder (output_folder).
The draw_boxes function takes the outputs from the YOLOv4-tiny model and the original img. It iterates through the detected objects, checking if their class IDs are in the list of desired class IDs and if their confidence scores are greater than 0.5. If both conditions are met, it calculates the bounding box coordinates and draws the bounding box on the image using the OpenCV functions cv2.rectangle and cv2.putText.
The script defines the file paths for the YOLOv4-tiny configuration file, weights file, input images folder, and output folder.
Finally, the script calls the load_images function to load the image files and the process_images function to perform object detection and save the processed images in the specified output folder.

In summary, this script performs object detection on a set of images using the YOLOv4-tiny model and saves the images with bounding boxes and labels indicating the detected objects and their confidence scores.

import os
import glob
import cv2
import numpy as np

def load_images(path):
    image_files = glob.glob(os.path.join(path, '*.jpg'))
    return image_files

def process_images(image_files, cfg_file, weights_file, output_folder):
    net = cv2.dnn.readNetFromDarknet(cfg_file, weights_file)
    net.setPreferableBackend(cv2.dnn.DNN_BACKEND_OPENCV)
    net.setPreferableTarget(cv2.dnn.DNN_TARGET_CPU)
    
    for img_path in image_files:
        img = cv2.imread(img_path)
        blob = cv2.dnn.blobFromImage(img, 1/255.0, (416, 416), swapRB=True, crop=False)
        net.setInput(blob)
        
        layer_names = net.getLayerNames()
        output_layers = [layer_names[i - 1] for i in net.getUnconnectedOutLayers().flatten().tolist()]
        outputs = net.forward(output_layers)
        
        draw_boxes(outputs, img)
        
        output_file = os.path.join(output_folder, os.path.basename(img_path))
        cv2.imwrite(output_file, img)


def draw_boxes(outputs, img):
    class_ids = [0, 1, 2]  # Taco, Shrimp, Vegetable
    class_names = ['Taco', 'Shrimp', 'Vegetable']  # Add class names as strings

    for output in outputs:
        for detection in output:
            scores = detection[5:]
            class_id = np.argmax(scores)
            confidence = scores[class_id]
            
            if class_id in class_ids and confidence > 0.5:
                center_x, center_y, w, h = (detection[0:4] * np.array([img.shape[1], img.shape[0], img.shape[1], img.shape[0]])).astype('int')
                x = int(center_x - w / 2)
                y = int(center_y - h / 2)
                cv2.rectangle(img, (x, y), (x + w, y + h), (0, 255, 0), 2)
                
                # Add confidence level to label and use class names instead of class ID
                label = f"{class_names[class_id]}: {confidence * 100:.2f}%"
                cv2.putText(img, label, (x, y - 5), cv2.FONT_HERSHEY_SIMPLEX, 0.5, (0, 255, 0), 2)


cfg_file = '/drive/MyDrive/Colab Notebooks/yolo_oid_data/Tacos/yolov4-tiny-custom.cfg'
weights_file = '/drive/MyDrive/Training/Tacos/yolov4-tiny-custom_best.weights'
images_path = '/content/multidata/test'
output_folder = '/drive/MyDrive/Training/Tacos/prediction2'
    
image_files = load_images(images_path)
process_images(image_files, cfg_file, weights_file, output_folder)