Monthly
288 pp. per issue, 6 x 9,
illustrated
Founded: 1989
ISSN 0899-7667
E-ISSN 1530-888X
2008 ISI Impact Factor: 2.378
|
January 1993, Vol. 5, No. 1, Pages 140-153
Posted Online April 4, 2008.
(doi:10.1162/neco.1993.5.1.140)
© 1993 Massachusetts Institute of Technology
Statistical Theory of Learning Curves under Entropic Loss Criterion Shun-ichi AmariDepartment of Mathematical Engineering and Information Physics, University of Tokyo, Bunkyo-ku, Tokyo 113, Japan Noboru MurataDepartment of Mathematical Engineering and Information Physics, University of Tokyo, Bunkyo-ku, Tokyo 113, Japan
The present paper elucidates a universal property of learning curves, which shows how the generalization error, training error, and the complexity of the underlying stochastic machine are related and how the behavior of a stochastic machine is improved as the number of training examples increases. The error is measured by the entropic loss. It is proved that the generalization error converges to H0, the entropy of the conditional distribution of the true machine, as H0 + m*/(2t), while the training error converges as H0 - m*/(2t), where t is the number of examples and m* shows the complexity of the network. When the model is faithful, implying that the true machine is in the model, m* is reduced to m, the number of modifiable parameters. This is a universal law because it holds for any regular machine irrespective of its structure under the maximum likelihood estimator. Similar relations are obtained for the Bayes and Gibbs learning algorithms. These learning curves show the relation among the accuracy of learning, the complexity of a model, and the number of training examples. Cited byYu Nishiyama, Sumio Watanabe. (2007) Stochastic complexity of complete bipartite graph-type Boltzmann machines in mean field approximation. Electronics and Communications in Japan (Part III: Fundamental Electronic Science) 90:9, 1-9 Online publication date: 1-Oct-2007. CrossRef Shun-ichi Amari, Hyeyoung Park, Tomoko Ozeki. (2006) Singularities Affect Dynamics of Learning in Neuromanifolds. Neural Computation 18:5, 1007-1065 Online publication date: 1-May-2006. Abstract
| PDF (594 KB)
| PDF Plus (638 KB) Koichiro Nishiue, Sumio Watanabe. (2005) Effects of priors in model selection problem of learning machines with singularities. Electronics and Communications in Japan (Part II: Electronics) 88:2, 47-58 Online publication date: 1-Mar-2005. CrossRef Kazushi Ikeda. (2004) An Asymptotic Statistical Theory of Polynomial Kernel Methods. Neural Computation 16:8, 1705-1719 Online publication date: 1-Aug-2004. Abstract
| PDF (123 KB)
| PDF Plus (150 KB) Koji Tsuda, Shotaro Akaho, Motoaki Kawanabe, Klaus-Robert Müller. (2004) Asymptotic Properties of the Fisher Kernel. Neural Computation 16:1, 115-137 Online publication date: 1-Jan-2004. Abstract
| PDF (141 KB)
| PDF Plus (151 KB) Toshiaki Aida. (2001) Reparametrization-covariant theory for on-line learning of probability distributions. Physical Review E 64:5, Online publication date: 1-Dec-2001. CrossRef Sumio Watanabe. (2001) Algebraic Analysis for Nonidentifiable Learning Machines. Neural Computation 13:4, 899-933 Online publication date: 1-Apr-2001. Abstract
| PDF (286 KB)
| PDF Plus (328 KB) Didier Herschkowitz, Manfred Opper. (2001) Retarded Learning: Rigorous Results from Statistical Mechanics. Physical Review Letters 86:10, 2174-2177 Online publication date: 1-Apr-2001. CrossRef Wenxin Jiang, M.A. Tanner. (2000) On the asymptotic normality of hierarchical mixtures-of-experts for generalized linear models. IEEE Transactions on Information Theory 46:3, 1005-1013 Online publication date: 1-Jun-2000. CrossRef Toshiaki Aida. (1999) Field Theoretical Analysis of On-Line Learning of Probability Distributions. Physical Review Letters 83:17, 3554-3557 Online publication date: 1-Nov-1999. CrossRef Silvia Scarpetta, Magnus Rattray, David Saad. (1999) Journal of Physics A: Mathematical and General 32:22, 4047-4059 Online publication date: 4-Jul-1999. CrossRef S. Guarnieri, F. Piazza, A. Uncini. (1999) Multilayer feedforward networks with adaptive spline activation function. IEEE Transactions on Neural Networks 10:3, 672-683 Online publication date: 1-Jun-1999. CrossRef Magnus Rattray, David Saad. (1999) Analysis of natural gradient descent for multilayer neural networks. Physical Review E 59:4, 4523-4532 Online publication date: 1-May-1999. CrossRef Terrence L. Fine, Sayandev Mukherjee. (1999) Parameter Convergence and Learning Curves for Neural Networks. Neural Computation 11:3, 747-769 Online publication date: 1-Apr-1999. Abstract
| PDF (163 KB)
| PDF Plus (173 KB) Didier Herschkowitz, Jean-Pierre Nadal. (1999) Unsupervised and supervised learning:   Mutual information between parameters and observations. Physical Review E 59:3, 3344-3360 Online publication date: 1-Apr-1999. CrossRef A. Uncini, L. Vecci, P. Campolucci, F. Piazza. (1999) Complex-valued neural networks with adaptive spline activation function for digital-radio-links nonlinear equalization. IEEE Transactions on Signal Processing 47:2, 505 CrossRef Magnus Rattray, David Saad, Shun-ichi Amari. (1999) Natural Gradient Descent for On-Line Learning. Physical Review Letters 81:24, 5461-5464 Online publication date: 1-Jan-1999. CrossRef Jianfeng Feng. (1998) Journal of Physics A: Mathematical and General 31:17, 4037-4048 Online publication date: 1-Jun-1998. CrossRef A.J. Zeevi, R. Meir, V. Maiorov. (1998) Error bounds for functional approximation and estimation using mixtures of experts. IEEE Transactions on Information Theory 44:3, 1010-1025 Online publication date: 1-Jun-1998. CrossRef Shun-ichi Amari. (1998) Natural Gradient Works Efficiently in Learning. Neural Computation 10:2, 251-276 Online publication date: 1-Feb-1998. Abstract
| PDF (155 KB)
| PDF Plus (220 KB) S. Raudys. (1997) On dimensionality, sample size, and classification error of nonparametric linear classification algorithms. IEEE Transactions on Pattern Analysis and Machine Intelligence 19:6, 667-671 Online publication date: 1-Jul-1997. CrossRef A. Atiya, Chuanyi Ji. (1997) How initial conditions affect generalization performance in large networks. IEEE Transactions on Neural Networks 8:2, 448-451 Online publication date: 1-Apr-1997. CrossRef Sepp Hochreiter, Jürgen Schmidhuber. (1997) Flat Minima. Neural Computation 9:1, 1-42 Online publication date: 1-Jan-1997. Abstract
| PDF (303 KB)
| PDF Plus (341 KB) S. Amari, N. Murata, K.-R. Muller, M. Finke, H.H. Yang. (1997) Asymptotic statistical theory of overtraining and cross-validation. IEEE Transactions on Neural Networks 8:5, 985 CrossRef Manfred Opper. (1996) On-line versus Off-line Learning from Random Examples: General Results. Physical Review Letters 77:22, 4671-4674 Online publication date: 1-Dec-1996. CrossRef K.-R. Müller, M. Finke, N. Murata, K. Schulten, S. Amari. (1996) A Numerical Study on Learning Curves in Stochastic Multilayer Feedforward Networks. Neural Computation 8:5, 1085-1106 Online publication date: 1-Jul-1996. Abstract
| PDF (913 KB)
| PDF Plus (513 KB) Manfred Opper, David Haussler. (1995) Bounds for Predictive Errors in the Statistical Mechanics of Supervised Learning. Physical Review Letters 75:20, 3772-3775 Online publication date: 1-Dec-1995. CrossRef Florence d'Alché-Buc, Jean-Pierre Nadal. (1995) Asymptotic performances of a constructive algorithm. Neural Processing Letters 2:2, 1-4 Online publication date: 1-Apr-1995. CrossRef M. B Gordon, D. R Grempel. (1995) Learning with a Temperature-Dependent Algorithm. Europhysics Letters (EPL) 29:3, 257-262 Online publication date: 20-Feb-1995. CrossRef Peter Sollich. (1994) Query construction, entropy, and generalization in neural-network models. Physical Review E 49:5, 4637-4651 Online publication date: 1-Jun-1994. CrossRef
|