Find in Library
Search millions of books, articles, and more
Indexed Open Access Databases
Coresets for the Average Case Error for Finite Query Sets
oleh: Alaa Maalouf, Ibrahim Jubran, Murad Tukan, Dan Feldman
Format: | Article |
---|---|
Diterbitkan: | MDPI AG 2021-10-01 |
Deskripsi
Coreset is usually a small weighted subset of an input set of items, that provably approximates their loss function for a given set of queries (models, classifiers, hypothesis). That is, the maximum (worst-case) error over all queries is bounded. To obtain smaller coresets, we suggest a natural relaxation: coresets whose average error over the given set of queries is bounded. We provide both deterministic and randomized (generic) algorithms for computing such a coreset for any finite set of queries. Unlike most corresponding coresets for the worst-case error, the size of the coreset in this work is independent of both the input size and its Vapnik–Chervonenkis (VC) dimension. The main technique is to reduce the average-case coreset into the vector summarization problem, where the goal is to compute a weighted subset of the <i>n</i> input vectors which approximates their sum. We then suggest the first algorithm for computing this weighted subset in time that is linear in the input size, for <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><mi>n</mi><mo>≫</mo><mn>1</mn><mo>/</mo><mi>ε</mi></mrow></semantics></math></inline-formula>, where <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mi>ε</mi></semantics></math></inline-formula> is the approximation error, improving, e.g., both [ICML’17] and applications for principal component analysis (PCA) [NIPS’16]. Experimental results show significant and consistent improvement also in practice. Open source code is provided.