2024
1.
Md. Rakibul Hasan, Md Rafsan Jani; Rahman, Mahmudur
21.02.2024.
Abstract | Links | BibTeX | Tags: Biomedical Image Annotation, CNN, DistilGPT2., Image retrieval, Multimodal Learning, ResNet50, ViT
@misc{nokey,
title = {Image and Text Feature Based Multimodal Learning for Multi-Label Classification of Radiology Images in Biomedical Literature},
author = {Md. Rakibul Hasan, Md Rafsan Jani and Mahmudur Rahman},
url = {https://www.insticc.org/node/TechnicalProgram/biostec/2024/presentationDetails/124384},
year = {2024},
date = {2024-02-21},
urldate = {2024-02-21},
abstract = {Biomedical images are crucial for diagnosing and planning treatments, as well as advancing scientific understanding of various ailments. To effectively highlight regions of interest (RoIs) and convey medical concepts, annotation markers like arrows, letters, or symbols are employed. However, annotating these images with appropriate medical labels poses a significant challenge. In this study, we propose a framework that leverages multimodal input features, including text/label features and visual features, to facilitate accurate annotation of biomedical images with multiple labels. Our approach integrates state-of-the-art models such as ResNet50 and Vision Transformers (ViT) to extract informative features from the images. Additionally, we employ Generative Pre-trained Distilled-GPT2 (Transformer based Natural Language Processing architecture) to extract textual features, leveraging their natural language understanding capabilities. This combination of image and text modalities allows for a more comprehensive representation of the biomedical data, leading to improved annotation accuracy. By combining the features extracted from both image and text modalities, we trained a simplified Convolutional Neural Network (CNN) based multi-classifier to learn the image-text relations and predict multi-labels for multi-modal radiology images. We used ImageCLEFmedical 2022 and 2023 datasets to demonstrate the effectiveness of our framework. This dataset likely contains a diverse range of biomedical images, enabling the evaluation of the framework’s performance under realistic conditions. We have achieved promising results with the F1 score of 0.508. Our proposed framework exhibits potential performance in annotating biomedical images with multiple labels, contributing to improved image understanding and analysis in the medical image processing domain.},
keywords = {Biomedical Image Annotation, CNN, DistilGPT2., Image retrieval, Multimodal Learning, ResNet50, ViT},
pubstate = {published},
tppubtype = {presentation}
}
Biomedical images are crucial for diagnosing and planning treatments, as well as advancing scientific understanding of various ailments. To effectively highlight regions of interest (RoIs) and convey medical concepts, annotation markers like arrows, letters, or symbols are employed. However, annotating these images with appropriate medical labels poses a significant challenge. In this study, we propose a framework that leverages multimodal input features, including text/label features and visual features, to facilitate accurate annotation of biomedical images with multiple labels. Our approach integrates state-of-the-art models such as ResNet50 and Vision Transformers (ViT) to extract informative features from the images. Additionally, we employ Generative Pre-trained Distilled-GPT2 (Transformer based Natural Language Processing architecture) to extract textual features, leveraging their natural language understanding capabilities. This combination of image and text modalities allows for a more comprehensive representation of the biomedical data, leading to improved annotation accuracy. By combining the features extracted from both image and text modalities, we trained a simplified Convolutional Neural Network (CNN) based multi-classifier to learn the image-text relations and predict multi-labels for multi-modal radiology images. We used ImageCLEFmedical 2022 and 2023 datasets to demonstrate the effectiveness of our framework. This dataset likely contains a diverse range of biomedical images, enabling the evaluation of the framework’s performance under realistic conditions. We have achieved promising results with the F1 score of 0.508. Our proposed framework exhibits potential performance in annotating biomedical images with multiple labels, contributing to improved image understanding and analysis in the medical image processing domain.
2011
2.
Rahman, M M; Antani, S K; Thoma, G R
A query expansion framework in image retrieval domain based on local and global analysis Journal Article
In: Information Processing & Management, vol. 47, no. 5, pp. 676-691, 2011, ISSN: 0306-4573, (Managing and Mining Multilingual Documents).
Abstract | Links | BibTeX | Tags: Image retrieval, Query expansion, Relevance feedback, Support vector machine, Vector space model
@article{RAHMAN2011676,
title = {A query expansion framework in image retrieval domain based on local and global analysis},
author = {M M Rahman and S K Antani and G R Thoma},
url = {https://www.sciencedirect.com/science/article/pii/S0306457310001020},
doi = {https://doi.org/10.1016/j.ipm.2010.12.001},
issn = {0306-4573},
year = {2011},
date = {2011-01-01},
journal = {Information Processing & Management},
volume = {47},
number = {5},
pages = {676-691},
abstract = {We present an image retrieval framework based on automatic query expansion in a concept feature space by generalizing the vector space model of information retrieval. In this framework, images are represented by vectors of weighted concepts similar to the keyword-based representation used in text retrieval. To generate the concept vocabularies, a statistical model is built by utilizing Support Vector Machine (SVM)-based classification techniques. The images are represented as “bag of concepts” that comprise perceptually and/or semantically distinguishable color and texture patches from local image regions in a multi-dimensional feature space. To explore the correlation between the concepts and overcome the assumption of feature independence in this model, we propose query expansion techniques in the image domain from a new perspective based on both local and global analysis. For the local analysis, the correlations between the concepts based on the co-occurrence pattern, and the metrical constraints based on the neighborhood proximity between the concepts in encoded images, are analyzed by considering local feedback information. We also analyze the concept similarities in the collection as a whole in the form of a similarity thesaurus and propose an efficient query expansion based on the global analysis. The experimental results on a photographic collection of natural scenes and a biomedical database of different imaging modalities demonstrate the effectiveness of the proposed framework in terms of precision and recall.},
note = {Managing and Mining Multilingual Documents},
keywords = {Image retrieval, Query expansion, Relevance feedback, Support vector machine, Vector space model},
pubstate = {published},
tppubtype = {article}
}
We present an image retrieval framework based on automatic query expansion in a concept feature space by generalizing the vector space model of information retrieval. In this framework, images are represented by vectors of weighted concepts similar to the keyword-based representation used in text retrieval. To generate the concept vocabularies, a statistical model is built by utilizing Support Vector Machine (SVM)-based classification techniques. The images are represented as “bag of concepts” that comprise perceptually and/or semantically distinguishable color and texture patches from local image regions in a multi-dimensional feature space. To explore the correlation between the concepts and overcome the assumption of feature independence in this model, we propose query expansion techniques in the image domain from a new perspective based on both local and global analysis. For the local analysis, the correlations between the concepts based on the co-occurrence pattern, and the metrical constraints based on the neighborhood proximity between the concepts in encoded images, are analyzed by considering local feedback information. We also analyze the concept similarities in the collection as a whole in the form of a similarity thesaurus and propose an efficient query expansion based on the global analysis. The experimental results on a photographic collection of natural scenes and a biomedical database of different imaging modalities demonstrate the effectiveness of the proposed framework in terms of precision and recall.