
수중 기뢰 탐지 및 분류를 위한 Vision-Language Model 분석
Ⓒ 2026 Korea Society for Naval Science & Technology
초록
수중 기뢰는 함선·선박 파괴와 해상 봉쇄 등 전략적 역할을 수행하므로 기뢰 탐지 연구가 필수적이다. 이에 따라, 급변하는 해양 환경, 다양한 기뢰 형태, 전시 상황에서의 작전 변화에 적응할 수 있는 모델이 필요하다. Vision-Language Model(VLM)은 이미지와 텍스트를 동시에 처리할 수 있는 모델로, 변화가 잦은 수중 환경에 적합하다. 본 논문은 최신 VLM을 분석하고, 이를 수중 기뢰 탐지·분류에 적용하는 방안을 모색한다.
Abstract
Underwater mines play a strategic role in destroying ships and vessels and enforcing maritime blockades. Therefore, mine detection research is essential. Accordingly, a model that can adapt to rapidly changing marine environments, various mine types, and operational shifts in wartime is required. Vision-language models (VLMs) can process both images and texts simultaneously, making them well-suited for the frequently changing underwater environment. This paper analyzes the latest VLMs and explores methods to apply them to underwater mine detection and classification.
Keywords:
Underwater Mine Detection, Vision-Language Model, Multimodal Dataset, Image Classification, Open-World Object Detection키워드:
수중 기뢰 탐지, 비전-언어 모델, 멀티모달 데이터셋, 이미지 분류, 오픈-월드 객체 탐지Acknowledgments
이 논문은 2023년도 정부(방위사업청)의 재원으로 국방기술진흥연구소의 지원을 받아 수행된 연구임(No. KRIT-CT-23-035-03, AI기반 수중 기뢰 탐지 기술(기뢰탐지용 무인잠수정 군집 운용 기술)).
References
- Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, Gretchen Krueger, & Ilya Sutskever, ‘Learning Transferable Visual Models from Natural Language Supervision,’ in proceedings of the 38th International Conference on Machine Learning, 2021, pp. 8748-8763.
- Chao Jia, Yinfei Yang, Ye Xia, Yi-Ting Chen, Zarana Parekh, Hieu Pham, Quoc Le, Yun-Hsuan Sung, Zhen Li, & Tom Duerig, ‘Scaling up Visual and Vision-Language Representation Learning with Noisy Text Supervision,’ in proceedings of the 38th International Conference on Machine Learning, 2021, pp. 4904-4916.
-
Lewei Yao, Jianhua Han, Youpeng Wen, Xiaodan Liang, Dan Xu, Wei Zhang, Zhenguo Li, Chunjing Xu, & Hang, Xu, ‘Detclip: Dictionary-Enriched Visual-Concept Paralleled Pre-Training for Open-World Detection,’ Advances in Neural Information Processing Systems 35, 2022, pp. 9125-9138.
[https://doi.org/10.52202/068431-0663]
-
Tianheng Cheng, Lin Song, Yixiao Ge, Wenyu Liu, Xinggang Wang, & Ying Shan, ‘Yolo-World: Real-Time Open-Vocabulary Object Detection,’ in proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2024, pp. 16901-16911.
[https://doi.org/10.1109/CVPR52733.2024.01599]
-
Liunian Harold Li, Pengchuan Zhang, Haotian Zhang, Jianwei Yang, Chunyuan Li, Yiwu Zhong, Lijuan Wang, Lu Yuan, Lei Zhang, Jenq-Neng Hwang, Kai-Wei Chang, & Jianfeng Gao, ‘Grounded Language-Image Pre-Training,’ in proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2022, pp. 10965-10975.
[https://doi.org/10.1109/CVPR52688.2022.01069]