다중 무인 수중운동체 위협 환경에서 PPO 기반 강화학습을 이용한 잠수함의 최적 회피 및 의사결정 전략 연구

강언약¹ ; 홍우영² ; 이귀영³ ; 이종무⁴ ; 백혁재⁵ ; 배준호⁶ ; 추영민⁷^{, *}

1세종대학교 해양시스템융합공학과 석사과정
2세종대학교 국방시스템공학과 교수
3해군 대위/세종대학교 해양시스템융합공학과 석사과정
4LIG넥스원 해양연구소 선임연구원
5LIG넥스원 해양연구소 수석연구원
6해군 소령/서울대학교 조선해양공학과 박사과정
7서울대학교 조선해양공학과 부교수

Optimal Evasion Decision and Strategies of Submarine Using PPO-based Reinforcement Learning under Multiple UUVs Threat Environment

Eonyak Kang¹ ; Wooyoung Hong² ; Gwiyoung Lee³ ; Jongmoo Lee⁴ ; Hyukjae Baek⁵ ; Junho Bae⁶ ; Youngmin Choo⁷^{, *}

1M.S. student, Dept. of Ocean Systems Engineering, Sejong University
2Professor, Dept. of Defense Systems Engineering, Sejong University
3LT, ROK Navy/M.S. student, Dept. of Ocean Systems Engineering, Sejong University
4Research engineer, Maritime R&D Center, LIG Nex1
5Chief research engineer, Maritime R&D Center, LIG Nex1
6LCDR, ROK Navy/Ph.D. student, Dept. of Naval Architecture and Ocean Engineering, Seoul National University
7Associate Professor, Dept. of Naval Architecture and Ocean Engineering, Seoul National University

Correspondence to: ^*Youngmin Choo Tel: +82-2-880-8380 E-mail: sonacer@snu.ac.kr

초록

본 연구는 다중 무인 수중운동체(UUV)의 위협 환경에서 잠수함의 생존율을 높이기 위해, 음향 탐지 모델을 보상 함수 설계에 통합한 강화학습 기반의 회피 전략을 제안한다. 구체적으로, 시뮬레이션 환경은 다중 UUV, 잠수함의 운동 모델링과 음향 탐지 모델로 구성하였다. 보상 함수는 잠수함이 적대적 UUV의 탐지 및 공격을 능동적으로 회피할 수 있도록 설계되었다. 시뮬레이션 결과, 제안한 강화학습 기반 회피 전략은 기존의 고정된 패턴 전략과 비교하여 잠수함의 생존율을 크게 높였다. 또한, 음향 탐지 모델에 따라 탐지 신호를 최소화하는 최적의 회피 기동을 학습함으로써, 효과적으로 회피하고 높은 생존율을 달성하였다.

Abstract

This study proposes a reinforcement learning-based evasion strategy that integrates an acoustic detection model into the reward function design to enhance submarine survivability in a multi-unmanned underwater vehicle (UUV) threat environment. Specifically, the simulation environment was constructed with multi-UUV and submarine motion modeling, along with an acoustic detection model. The reward function was designed to enable the submarine to actively evade detection and attacks from hostile UUVs. Simulation results show that the proposed reinforcement learning-based evasion strategy significantly increased submarine survivability compared to conventional fixed-pattern strategies. Furthermore, by learning optimal evasion maneuvers that minimize detection signals based on the acoustic detection model, the strategy achieved effective evasion and high survivability.

Keywords:

Unmanned Underwater Vehicle, Submarine, Reinforcement Learning, Evasion Strategy, Acoustic Detection Model

키워드:

무인 수중운동체, 잠수함, 강화학습, 회피 전략, 음향 탐지 모델

Acknowledgments

본 연구는 LIG Nex1의 지원을 받아 수행된 연구 결과임.

References

B. Kang and W. Yun, “Hierarchical Reinforcement Learning for Submarine Torpedo Countermeasures and Evasive Manoeuvres,” IEEE Access, 2024. [https://doi.org/10.1109/ACCESS.2024.3487152]
J.-M. Pak, B.-H. Ku, Y.-H. Lee, D.-G. Ryu, W.-Y. Hong, H.-S. Ko, and M.-T. Lim, “Effectiveness Analysis for a Lightweight Torpedo Considering Evasive Maneuvering and Torpedo Acoustic Counter Measures of a Target,” Journal of the Korea Society for Simulation, Vol. 20, No. 4, pp. 1–11, 2011. [https://doi.org/10.9709/JKSS.2011.20.4.001]
A. Mjelde, “A Homing Torpedo: The Effect of the Tactical Situation and the Torpedo Parameters on the Torpedo Effectiveness,” Ph.D. dissertation, Naval Postgraduate School, Monterey, CA, USA, 1977.
K. R. Armo, “The Relationship Between a Submarine’s Maximum Speed and Its Evasive Capability,” M.S. thesis, Naval Postgraduate School, Monterey, CA, USA, 2000.
J.-H. Chung, G.-S. Kim, S.-H. Park, J.-H. Kim, and W. Yun, “Reinforcement Learning-Based Deception Tactics for Torpedo Threat Evasion,” Journal of the Korean Institute of Communications and Information Sciences, Vol. 49, No. 3, pp. 333–345, 2024. [https://doi.org/10.7840/kics.2024.49.3.333]
R. J. Urick, Principles of Underwater Sound, 3rd ed. New York, NY, USA: McGraw-Hill, 1983.
H. Medwin and C. S. Clay, Fundamentals of Acoustical Oceanography. New York, NY, USA: Academic Press, 1998.