KARAJ: An Efficient Adaptive Multi-Processor Tool to Streamline Genomic and Transcriptomic Sequence Data Acquisition

oleh: Mahdieh Labani, Amin Beheshti, Nigel H. Lovell, Hamid Alinejad-Rokny, Ali Afrasiabi

Format: Article
Diterbitkan: MDPI AG 2022-11-01

Deskripsi

Here we developed <i>KARAJ</i>, a fast and flexible Linux command-line tool to automate the end-to-end process of querying and downloading a wide range of genomic and transcriptomic sequence data types. The input to KARAJ is a list of PMCIDs or publication URLs or various types of accession numbers to automate four tasks as follows; firstly, it provides a summary list of accessible datasets generated by or used in these scientific articles, enabling users to select appropriate datasets; secondly, <i>KARAJ</i> calculates the size of files that users want to download and confirms the availability of adequate space on the local disk; thirdly, it generates a metadata table containing sample information and the experimental design of the corresponding study; and lastly, it enables users to download supplementary data tables attached to publications. Further, <i>KARAJ</i> provides a parallel downloading framework powered by <i>Aspera connect</i> which reduces the downloading time significantly.