Activate Activate Activate
contact  
Hello. Sign in to personalize your visit. New user? Register now.  

In
By author

Monthly
288 pp. per issue, 6 x 9,
illustrated
Founded: 1989
ISSN 0899-7667
E-ISSN 1530-888X
2008 ISI Impact Factor: 2.378

Neural Computation

January 1993, Vol. 5, No. 1, Pages 140-153
Posted Online April 4, 2008.
(doi:10.1162/neco.1993.5.1.140)
© 1993 Massachusetts Institute of Technology
Statistical Theory of Learning Curves under Entropic Loss Criterion

Shun-ichi Amari

Department of Mathematical Engineering and Information Physics, University of Tokyo, Bunkyo-ku, Tokyo 113, Japan

Noboru Murata

Department of Mathematical Engineering and Information Physics, University of Tokyo, Bunkyo-ku, Tokyo 113, Japan

PDF (610.432 KB) PDF Plus (266.126 KB)

The present paper elucidates a universal property of learning curves, which shows how the generalization error, training error, and the complexity of the underlying stochastic machine are related and how the behavior of a stochastic machine is improved as the number of training examples increases. The error is measured by the entropic loss. It is proved that the generalization error converges to H0, the entropy of the conditional distribution of the true machine, as H0 + m*/(2t), while the training error converges as H0 - m*/(2t), where t is the number of examples and m* shows the complexity of the network. When the model is faithful, implying that the true machine is in the model, m* is reduced to m, the number of modifiable parameters. This is a universal law because it holds for any regular machine irrespective of its structure under the maximum likelihood estimator. Similar relations are obtained for the Bayes and Gibbs learning algorithms. These learning curves show the relation among the accuracy of learning, the complexity of a model, and the number of training examples.

Cited by

Yu Nishiyama, Sumio Watanabe. (2007) Stochastic complexity of complete bipartite graph-type Boltzmann machines in mean field approximation. Electronics and Communications in Japan (Part III: Fundamental Electronic Science) 90:9, 1-9
Online publication date: 1-Oct-2007.
CrossRef
Shun-ichi Amari, Hyeyoung Park, Tomoko Ozeki. (2006) Singularities Affect Dynamics of Learning in Neuromanifolds. Neural Computation 18:5, 1007-1065
Online publication date: 1-May-2006.
Abstract | PDF (594 KB) | PDF Plus (638 KB) 
Koichiro Nishiue, Sumio Watanabe. (2005) Effects of priors in model selection problem of learning machines with singularities. Electronics and Communications in Japan (Part II: Electronics) 88:2, 47-58
Online publication date: 1-Mar-2005.
CrossRef
Kazushi Ikeda. (2004) An Asymptotic Statistical Theory of Polynomial Kernel Methods. Neural Computation 16:8, 1705-1719
Online publication date: 1-Aug-2004.
Abstract | PDF (123 KB) | PDF Plus (150 KB) 
Koji Tsuda, Shotaro Akaho, Motoaki Kawanabe, Klaus-Robert Müller. (2004) Asymptotic Properties of the Fisher Kernel. Neural Computation 16:1, 115-137
Online publication date: 1-Jan-2004.
Abstract | PDF (141 KB) | PDF Plus (151 KB) 
Toshiaki Aida. (2001) Reparametrization-covariant theory for on-line learning of probability distributions. Physical Review E 64:5,
Online publication date: 1-Dec-2001.
CrossRef
Sumio Watanabe. (2001) Algebraic Analysis for Nonidentifiable Learning Machines. Neural Computation 13:4, 899-933
Online publication date: 1-Apr-2001.
Abstract | PDF (286 KB) | PDF Plus (328 KB) 
Didier Herschkowitz, Manfred Opper. (2001) Retarded Learning: Rigorous Results from Statistical Mechanics. Physical Review Letters 86:10, 2174-2177
Online publication date: 1-Apr-2001.
CrossRef
Wenxin Jiang, M.A. Tanner. (2000) On the asymptotic normality of hierarchical mixtures-of-experts for generalized linear models. IEEE Transactions on Information Theory 46:3, 1005-1013
Online publication date: 1-Jun-2000.
CrossRef
Toshiaki Aida. (1999) Field Theoretical Analysis of On-Line Learning of Probability Distributions. Physical Review Letters 83:17, 3554-3557
Online publication date: 1-Nov-1999.
CrossRef
Silvia Scarpetta, Magnus Rattray, David Saad. (1999) Journal of Physics A: Mathematical and General 32:22, 4047-4059
Online publication date: 4-Jul-1999.
CrossRef
S. Guarnieri, F. Piazza, A. Uncini. (1999) Multilayer feedforward networks with adaptive spline activation function. IEEE Transactions on Neural Networks 10:3, 672-683
Online publication date: 1-Jun-1999.
CrossRef
Magnus Rattray, David Saad. (1999) Analysis of natural gradient descent for multilayer neural networks. Physical Review E 59:4, 4523-4532
Online publication date: 1-May-1999.
CrossRef
Terrence L. Fine, Sayandev Mukherjee. (1999) Parameter Convergence and Learning Curves for Neural Networks. Neural Computation 11:3, 747-769
Online publication date: 1-Apr-1999.
Abstract | PDF (163 KB) | PDF Plus (173 KB) 
Didier Herschkowitz, Jean-Pierre Nadal. (1999) Unsupervised and supervised learning:Mutual information between parameters and observations. Physical Review E 59:3, 3344-3360
Online publication date: 1-Apr-1999.
CrossRef
A. Uncini, L. Vecci, P. Campolucci, F. Piazza. (1999) Complex-valued neural networks with adaptive spline activation function for digital-radio-links nonlinear equalization. IEEE Transactions on Signal Processing 47:2, 505
CrossRef
Magnus Rattray, David Saad, Shun-ichi Amari. (1999) Natural Gradient Descent for On-Line Learning. Physical Review Letters 81:24, 5461-5464
Online publication date: 1-Jan-1999.
CrossRef
Jianfeng Feng. (1998) Journal of Physics A: Mathematical and General 31:17, 4037-4048
Online publication date: 1-Jun-1998.
CrossRef
A.J. Zeevi, R. Meir, V. Maiorov. (1998) Error bounds for functional approximation and estimation using mixtures of experts. IEEE Transactions on Information Theory 44:3, 1010-1025
Online publication date: 1-Jun-1998.
CrossRef
Shun-ichi Amari. (1998) Natural Gradient Works Efficiently in Learning. Neural Computation 10:2, 251-276
Online publication date: 1-Feb-1998.
Abstract | PDF (155 KB) | PDF Plus (220 KB) 
S. Raudys. (1997) On dimensionality, sample size, and classification error of nonparametric linear classification algorithms. IEEE Transactions on Pattern Analysis and Machine Intelligence 19:6, 667-671
Online publication date: 1-Jul-1997.
CrossRef
A. Atiya, Chuanyi Ji. (1997) How initial conditions affect generalization performance in large networks. IEEE Transactions on Neural Networks 8:2, 448-451
Online publication date: 1-Apr-1997.
CrossRef
Sepp Hochreiter, Jürgen Schmidhuber. (1997) Flat Minima. Neural Computation 9:1, 1-42
Online publication date: 1-Jan-1997.
Abstract | PDF (303 KB) | PDF Plus (341 KB) 
S. Amari, N. Murata, K.-R. Muller, M. Finke, H.H. Yang. (1997) Asymptotic statistical theory of overtraining and cross-validation. IEEE Transactions on Neural Networks 8:5, 985
CrossRef
Manfred Opper. (1996) On-line versus Off-line Learning from Random Examples: General Results. Physical Review Letters 77:22, 4671-4674
Online publication date: 1-Dec-1996.
CrossRef
K.-R. Müller, M. Finke, N. Murata, K. Schulten, S. Amari. (1996) A Numerical Study on Learning Curves in Stochastic Multilayer Feedforward Networks. Neural Computation 8:5, 1085-1106
Online publication date: 1-Jul-1996.
Abstract | PDF (913 KB) | PDF Plus (513 KB) 
Manfred Opper, David Haussler. (1995) Bounds for Predictive Errors in the Statistical Mechanics of Supervised Learning. Physical Review Letters 75:20, 3772-3775
Online publication date: 1-Dec-1995.
CrossRef
Florence d'Alché-Buc, Jean-Pierre Nadal. (1995) Asymptotic performances of a constructive algorithm. Neural Processing Letters 2:2, 1-4
Online publication date: 1-Apr-1995.
CrossRef
M. B Gordon, D. R Grempel. (1995) Learning with a Temperature-Dependent Algorithm. Europhysics Letters (EPL) 29:3, 257-262
Online publication date: 20-Feb-1995.
CrossRef
Peter Sollich. (1994) Query construction, entropy, and generalization in neural-network models. Physical Review E 49:5, 4637-4651
Online publication date: 1-Jun-1994.
CrossRef

Technology Partner - Atypon Systems, Inc.
  CrossRef member COUNTER member