TY - JOUR
T1 - A model is worth tens of thousands of examples for estimation and thousands for classification
AU - Dagès, Thomas
AU - Cohen, Laurent D.
AU - Bruckstein, Alfred M.
N1 - Publisher Copyright:
© 2024 Elsevier Ltd
PY - 2025/1
Y1 - 2025/1
N2 - Traditional signal processing methods relying on mathematical data generation models have been cast aside in favour of deep neural networks, which require vast amounts of data. Since the theoretical sample complexity is nearly impossible to evaluate, these amounts of examples are usually estimated with crude rules of thumb. However, these rules only suggest when the networks should work, but do not relate to the traditional methods. In particular, an interesting question is: how much data is required for neural networks to be on par or outperform, if possible, the traditional model-based methods? In this work, we empirically investigate this question in three simple examples covering estimation and classification, where the data is generated according to precisely defined mathematical models, and where well-understood optimal or state-of-the-art mathematical data-agnostic solutions are known. A first problem is deconvolving one-dimensional Gaussian signals, a second one is estimating a circle's radius and location in random grayscale images of disks, and a third one both classifies the presence of a line and locates it when present in a binary random dot image. By training various networks, either naive custom designed or well-established ones, with various amounts of training data, we find that networks require tens of thousands of examples for estimation in comparison to the traditional methods and thousands for classification, whether the networks are trained from scratch or even with transfer-learning or finetuning.
AB - Traditional signal processing methods relying on mathematical data generation models have been cast aside in favour of deep neural networks, which require vast amounts of data. Since the theoretical sample complexity is nearly impossible to evaluate, these amounts of examples are usually estimated with crude rules of thumb. However, these rules only suggest when the networks should work, but do not relate to the traditional methods. In particular, an interesting question is: how much data is required for neural networks to be on par or outperform, if possible, the traditional model-based methods? In this work, we empirically investigate this question in three simple examples covering estimation and classification, where the data is generated according to precisely defined mathematical models, and where well-understood optimal or state-of-the-art mathematical data-agnostic solutions are known. A first problem is deconvolving one-dimensional Gaussian signals, a second one is estimating a circle's radius and location in random grayscale images of disks, and a third one both classifies the presence of a line and locates it when present in a binary random dot image. By training various networks, either naive custom designed or well-established ones, with various amounts of training data, we find that networks require tens of thousands of examples for estimation in comparison to the traditional methods and thousands for classification, whether the networks are trained from scratch or even with transfer-learning or finetuning.
KW - Deep learning
KW - Model-based methods
KW - Sample complexity
UR - http://www.scopus.com/inward/record.url?scp=85202351715&partnerID=8YFLogxK
U2 - 10.1016/j.patcog.2024.110904
DO - 10.1016/j.patcog.2024.110904
M3 - ???researchoutput.researchoutputtypes.contributiontojournal.article???
AN - SCOPUS:85202351715
SN - 0031-3203
VL - 157
JO - Pattern Recognition
JF - Pattern Recognition
M1 - 110904
ER -