Find in Library
Search millions of books, articles, and more
Indexed Open Access Databases
Proximal Policy Optimization for Radiation Source Search
oleh: Philippe Proctor, Christof Teuscher, Adam Hecht, Marek Osiński
Format: | Article |
---|---|
Diterbitkan: | MDPI AG 2021-09-01 |
Deskripsi
Rapid search and localization for nuclear sources can be an important aspect in preventing human harm from illicit material in dirty bombs or from contamination. In the case of a single mobile radiation detector, there are numerous challenges to overcome such as weak source intensity, multiple sources, background radiation, and the presence of obstructions, i.e., a non-convex environment. In this work, we investigate the sequential decision making capability of deep reinforcement learning in the nuclear source search context. A novel neural network architecture (RAD-A2C) based on the <i>advantage actor critic</i> (A2C) framework and a particle filter gated recurrent unit for localization is proposed. Performance is studied in a randomized <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><mn>20</mn><mo>×</mo><mn>20</mn></mrow></semantics></math></inline-formula> m convex and non-convex simulation environment across a range of <i>signal-to-noise ratio</i> (SNR)s for a single detector and single source. RAD-A2C performance is compared to both an information-driven controller that uses a bootstrap particle filter and to a <i>gradient search</i> (GS) algorithm. We find that the RAD-A2C has comparable performance to the information-driven controller across SNR in a convex environment. The RAD-A2C far outperforms the GS algorithm in the non-convex environment with greater than <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><mn>95</mn><mo>%</mo></mrow></semantics></math></inline-formula> median completion rate for up to seven obstructions.