K-VQA: A visual question answering method

oleh: Hongbin GAO, Jinying MAO, Huiyong WANG

Format: Article
Diterbitkan: Hebei University of Science and Technology 2020-08-01

Deskripsi

The types of questions answered by the visual question answering of images and texts are roughly divided into two types. The first type is the questions that can get the answers directly from the images, and the second type is the questions that need the help of external knowledge to obtain the answers. The current visual question answering method only has a high accuracy in one kind of questions, but the technology to answer the second kind of questions is not yet mature. In order to expand the types of questions that can be answered, a visual question answering method- K-VQA was designed with the help of knowledge graph. On the basis of deep learning VQA, the types of questions are distinguished by querying the knowledge graph, so that different types of questions can be answered with the most appropriate method. For the questions that need to be answered with external knowledge, the images and information in the questions are used to determine the entities and attributes required to answer the questions, and the triples in the knowledge graph are extracted to obtain the answers to the questions. The results show that different visual question answering techniques are suitable for different types of questions. The K-VQA method can answer both simple questions and reasoning questions with an accuracy of 5667%. Therefore, as a visual question answering method assisted by knowledge graph, K-VQA can answer more types of questions and obtain higher accuracy, which has important reference value for further study of VQA and VQA methods..