Find in Library
Search millions of books, articles, and more
Indexed Open Access Databases
Binary Code Similarity Detection Method Based on Pre-training Assembly Instruction Representation
oleh: WANG Taiyan, PAN Zulie, YU Lu, SONG Jingbin
Format: | Article |
---|---|
Diterbitkan: | Editorial office of Computer Science 2023-04-01 |
Deskripsi
Binary code similarity detection has been widely used in vulnerability searching,malware detection,advanced program analysis and other fields in recent years,while program code is similar to natural language in a degree,researchers start to use pre-training and other natural language processing related technologies to improve accuracy.A binary code similarity detection method based on pre-training assembly instruction representation is proposed to deal with the accuracy bottleneck due to insufficient consideration of instruction probability features.It includes tokenization method for multi-arch assembly instructions,and pre-trai-ning tasks that considering control flow,data flow,instruction logic and probability of occurrence,to achieve better vectorized representation of instructions.Downstream binary code similarity detection task is improved by combining pre-training method to gain accuracy boost.Experiments show that,compared with the existing methods,the proposed method improves instruction representing performance by 23.7% at the maximum,and improves block searching ability and similarity detection performance by up to 33.97% and 400% respectively.