Find in Library
Search millions of books, articles, and more
Indexed Open Access Databases
A Personalized and Scalable Machine Learning-Based File Management System
oleh: Bansal Veena, Sati Dhiraj
Format: | Article |
---|---|
Diterbitkan: | University North 2022-01-01 |
Deskripsi
In this work, we present a hybrid image and document filing system that we have built. When a user wants to store a file in the system, it is processed to generate tags using an appropriate open-source machine learning system. Presently, we use OpenCV and Tesseract OCR for tagging files. OpenCV recognizes objects in the images and TesserAct recognizes text in the image. An image file is processed for object recognition using OpenCV as well for text/captions process using TesserAct, which are used for tagging the file. All other files are processed using Tesseract only for generating tags. The user can also enter their own tags. A database system has been built that stores tags and the image path. Every file is stored with its owner identification and it is time-stamped. The system has a client-server architecture and can be used for storing and retrieving a large number of files. This is a highly scalable system.