Shrihari  A

Centre for Intelligent Cyber-Physical Systems, Indian Institute of Technology Guwahati, North Guwahati - 781039.  a.shrihari@iitg.ac.in

I am a Ph.D scholar at IIT Guwahati working on Underwater Object Detection. I am excited about complex problems that can be tackled with learning-based systems using deep neural networks. Currently, my research focuses on Underwater Object Detection, and I am interseted in knowing how well an object can be detected in challenging environments.


Research

Underwater Object Detection

Divers, Submarines, ROVs, Shipwrecks, etc...

Abstract

Remotely Operated Vehicles (ROVs) along with vision based underwater object detection techniques can assist underwater exploration and research by identifying specific objects, such as shipwrecks, marine life, and man-made debris. It plays a crucial role in enhancing maritime security, protecting naval assets, and ensuring the safety of military operations in underwater environments. These object detection algorithms require large datasets for training. Since there are very few datasets available for underwater objects, in this paper, an Extended Underwater Object Detection dataset with 16 object categories (EUWOD-16) was constructed. This was achieved by building a new annotated dataset consisting of divers, artifacts and various marine species, and merging it with existing underwater object detection datasets by redefining their annotations. Later, the dataset was evaluated by a modified YOLOv5n architecture with GhostNet. This method involved the selective additions of ghost blocks in appropriate places to decrease the number of network parameters and FLOPs without a significant decrease in performance, and a Bi-FPN connection for a refined feature fusion pathway. The proposed model achieved higher accuracy at a comparatively lower number of parameters (and FLOPs) than both YOLOv5n and GhostNet.

Introduction

The tasks of underwater exploration, marine ecosystem study, seafloor mapping, historical wreck or artifact investigation, and underwater surveillance provide crucial insights into defense/military purposes for intrusion detection and target identification. It also helps in understanding the impacts of human activities on the oceans over time. This would help in developing effective strategies for Naval Forces.

Underwater object detection is challenging due to the factors such as low visibility, high noise, and color deviation, which could make it difficult to identify and track objects. Another set of challenges includes the requirement of accurate and stable detection models that could operate under resource constraints in real time. This is because underwater environments often have limited computational resources and data transmission capabilities. In addition to the aforementioned factors, the diversity and comprehensiveness of datasets are also crucial in building robust object detection models. The ability of the model to accurately classify a wide range of objects in practical scenarios is largely dependent on the diversity and number of classes present in the dataset. The existing datasets for underwater object detection mostly consist of aquatic creatures such as fish, echinus, holothurians, starfish, and scallops.

Introduction

Example detection results from YOLO-CSGB model.

Proposed Dataset

A new dataset named UDAD is been proposed to include divers, underwater artifacts, fish, and other marine species. Also, several existing datasets such as DUO, UODD, Trash-ICRA19 and the Brackish Dataset have been merged with the proposed UDAD dataset to provide a diverse range of objects and underwater environments, forming a comprehensive dataset EUOD16. This dataset contains five distinct classes: ROV, artifact, diver, fish, and other-bio. The dataset comprises a total of 2,697 images collected from three different sources: The first source is the UIEB, an image enhancement dataset, which consists of 890 images depicting various scenes with statues, shipwrecks, and divers. However, only 671 images were selected out of that dataset, and remaining images with many clustered fishes were ignored, since the focus is on divers and artifacts. The second source is the SUIM, an underwater segmentation dataset, which contains 1,525 train and 110 test images with segmentation labels. From this dataset, total 1,533 images were selected and over-crowded fish images were discarded. The third source is a YouTube video from the channel Guillaume Nery, from which 493 frames were generated and selected where a diver was thoroughly or appropriately present. All the selected images are annotated using labelImg software, resulting in a total of 2,697 annotated images. The existing datasets were merged by refining the labels to form the combined dataset. Then the combined dataset is extended with the UDAD dataset to form EUOD16 (Extended Underwater Object Detection) with 16 different classes.

Proposed Dataset

Example images from the EUOD16 Dataset where the first row images belongs to existing Datasets and the last row images belong to UDAD Dataset.

Methods

A modified YOLO-v5n (nano) model has been proposed. The proposed model, termed as YOLOv5-CSGB (Selective Ghosted Convolution with Bi- FPN), consists of GhostNet convolutions and Bi-FPN connection for creating more feature maps with less parameters and aggregating contextual information from multiple levels of the feature pyramid, respectively. The backbone of the YOLOv5 model is CSPDarknet-53, which is made up of repeating blocks of Cross Stage Partial Network (CSP) bottleneck, C3 layers and normal convolution layers. GhostNet is a cost-effective alternative to traditional convolutional blocks, characterized by a reduction in both the number of weights and floating-point operations (FLOPs). It has been demonstrated that traditional convolutions generate a significant amount of redundancy in the feature maps. Hence, this method advocates for the generation of only a certain percentage of feature maps via traditional convolution blocks, while the rest are generated via inexpensive linear operations, aided by Depthwise convolutions. Instead of converting all normal convolutions to Ghost modules, only specific convolutions were converted, called Selective Ghosting. Channel Sharpening Attention Module (CSAM) is a channel-wise attention mechanism introduced to enhance the expressiveness of the YOLOv5 model in object detection tasks, particularly in challenging underwater environments where stacking, blurring, and unclear edges can occur between objects due to the opacity of the water and shooting angles. Bidirectional Feature Pyramid Network (Bi-FPN) is a feature fusion method that combines multi-scale features from different levels of the backbone and neck of the model. So, the proposed model YOLOv5-CSGB uses selective ghosting to get higher number of feature without increasing model size and uses BiFPN for improving multi-scale detection capability and CSAM to improve the sharpness of the images at the input.

Results

The system used for the conducted experiments has linux operating System with NVIDIA Tesla V100 32GB memory with CUDA 10.2. The programming language and framework used are python 3.9 and pytorch 1.8.0 respectively. All the models during the experiments were trained with batch size 8 and the learning rate starting at 0.01 for 200 epochs with SGD Optimizer. Quantitative analysis was performed and compared with the existing YOLOv5n and GhostNet models. The YOLOv5-CSGB achieves 4.1% increase in mAP@0.5-0.95 metrics in the EUOD16 dataset and 11.8%, 13%, and 7.9% increase in mAP@0.5-0.95 in the classes divers, artifact, and ROVs, respectively.


Publications

Project image
A Novel Network Architecture for Microplankton Classification in Digital Holographic Images
Shrihari A, Prithwijit Guha, Rishikesh Kulkarni,
arXiv.org, 2023
Project Page / Paper /
@InProceedings{S2023, 
author = {Shrihari A and Prithwijit Guha and Rishikesk Kulkarni},
title = {A Novel Network Architecture for Microplankton Classification in Digital Holographic Images},
conference = {10th International Conference on Pattern Recognition and Machine Intelligence (PReMI)},
publisher = {Springer},
year = {2023}
}
Project image
Deep Neural Network for Underwater Microplankton Classification using Holograms
Shrihari A, Prithwijit Guha, Rishikesh Kulkarni,
arXiv.org, 2023
Project Page / Paper /
@InProceedings{Shrihari2023ARXIV, 
                                        
author = {Shrihari A and Prithwijit Guha and Rishikesk Kulkarni},
title = {Deep Neural Network for Underwater Microplankton Classification using Holograms},
conference = {Optica Imaging Congress, Boston},
publisher = {OSA},
year = {2023}
}
Project image
EUWOD-16: An Extended Dataset for Underwater Object Detection
Vignesh P, Shrihari A, Prithwijit Guha,
arXiv.org, 2023
Project Page / Paper /
@InProceedings{Vignesh2023ARXIV, 
												author = {Vignesh P and Shrihari A and Prithwijit Guha}, 
												title = {EUWOD-16: An Extended Dataset for Underwater Object Detection}, 
												conference = {8th International Conference on Computer Vision and Image Processing (CVIP)},
												publisher = {Springer Nature},
												year = {2023}
												}
											

Skills

Programming Languages & Tools
  • python
  • keras
  • tensorflow
  • pytorchPyTorch
  • pysparkPySPARK
  • matlab
  • r_program
  • hadoop
Expertise
  • Deep Learning and Machine Learning
  • Computer Vision
  • Distributed Computing

Interests

Apart from being a researcher, I enjoy most of my free time getting updated with the recent technologies in the field of deep learning. I enjoy trecking, travelling and exploring places.

When forced indoors, I spend a large amount of my free time exploring the spiritual aspect of life.


Awards & Certifications

  • Accelerated Data Science Course Completion
  • Guest speaker at 10 day summer workshop on "machine learning and deep learning"