Find in Library
Search millions of books, articles, and more
Indexed Open Access Databases
CuneiML: A Cuneiform Dataset for Machine Learning
oleh: Danlu Chen, Aditi Agarwal, Taylor Berg-Kirkpatrick, Jacobo Myerston
Format: | Article |
---|---|
Diterbitkan: | Ubiquity Press 2023-12-01 |
Deskripsi
The cuneiform writing system holds a vast reservoir of ancient literature, encompassing over 3000 years of history. Originating around the mid-fourth millennium BCE and enduring until the late first millennium BCE, cuneiform writing spans various genres such as administrative, legal, medical, and scientific documents, among others. This article introduces a curated dataset, CuneiML, featuring 38,947 high-resolution 2D photos of Sumerian and Akkadian cuneiform tablets, accompanied by their cuneiform Unicode transcriptions, transliterations, lineart, and metadata. This dataset aims to support the development of machine learning tools for processing and analyzing Sumerian and Akkadian cuneiform artifacts – e.g. for automatically classifying genre, provenance, or period from unannotated tablet images. Thus, CuneiML is designed with consistency of format as a primary concern. Specifically, CuneiML is a result of meticulously preprocessing, segmenting, filtering, and re-transliterating data that is available online in the Cuneiform Digital Library Initiative (CDLI) collection.