Coresets for the Average Case Error for Finite Query Sets

oleh: Alaa Maalouf, Ibrahim Jubran, Murad Tukan, Dan Feldman

Format:	Article
Diterbitkan:	MDPI AG 2021-10-01

Deskripsi

Coreset is usually a small weighted subset of an input set of items, that provably approximates their loss function for a given set of queries (models, classifiers, hypothesis). That is, the maximum (worst-case) error over all queries is bounded. To obtain smaller coresets, we suggest a natural relaxation: coresets whose average error over the given set of queries is bounded. We provide both deterministic and randomized (generic) algorithms for computing such a coreset for any finite set of queries. Unlike most corresponding coresets for the worst-case error, the size of the coreset in this work is independent of both the input size and its Vapnik–Chervonenkis (VC) dimension. The main technique is to reduce the average-case coreset into the vector summarization problem, where the goal is to compute a weighted subset of the <i>n</i> input vectors which approximates their sum. We then suggest the first algorithm for computing this weighted subset in time that is linear in the input size, for <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><mi>n</mi><mo>≫</mo><mn>1</mn><mo>/</mo><mi>ε</mi></mrow></semantics></math></inline-formula>, where <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mi>ε</mi></semantics></math></inline-formula> is the approximation error, improving, e.g., both [ICML’17] and applications for principal component analysis (PCA) [NIPS’16]. Experimental results show significant and consistent improvement also in practice. Open source code is provided.

Find in Library

Indexed Open Access Databases

Coresets for the Average Case Error for Finite Query Sets

Deskripsi