Activate Activate Activate
contact  
Hello. Sign in to personalize your visit. New user? Register now.  

In
By author

Monthly
288 pp. per issue, 6 x 9,
illustrated
Founded: 1989
ISSN 0899-7667
E-ISSN 1530-888X
2008 ISI Impact Factor: 2.378

Neural Computation

August 2003, Vol. 15, No. 8, Pages 1897-1929
Posted Online March 13, 2006.
(doi:10.1162/08997660360675080)
© 2003 Massachusetts Institute of Technology
Recurrent Neural Networks with Small Weights Implement Definite Memory Machines

Barbara Hammer

Department of Mathematics Computer Science, University of Osnabrück, D-49069, Osnabrück, Germany,

Peter Tiňo

School of Computer Science, University of Birmingham, Edgbaston, Birmingham B15 2TT, U.K.,

PDF (366.107 KB) PDF Plus (404.067 KB)

Recent experimental studies indicate that recurrent neural networks initialized with “small” weights are inherently biased toward definite memory machines (Tiňno, Čerňanský, & Beňušková, 2002a, 2002b). This article establishes a theoretical counterpart: transition function of recurrent network with small weights and squashing activation function is a contraction. We prove that recurrent networks with contractive transition function can be approximated arbitrarily well on input sequences of unbounded length by a definite memory machine. Conversely, every definite memory machine can be simulated by a recurrent network with contractive transition function. Hence, initialization with small weights induces an architectural bias into learning with recurrent neural networks. This bias might have benefits from the point of view of statistical learning theory: it emphasizes one possible region of the weight space where generalization ability can be formally proved. It is well known that standard recurrent neural networks are not distribution independent learnable in the probably approximately correct (PAC) sense if arbitrary precision and inputs are considered. We prove that recurrent networks with contractive transition function with a fixed contraction parameter fulfill the so-called distribution independent uniform convergence of empirical distances property and hence, unlike general recurrent networks, are distribution independent PAC learnable.

Cited by

André Grüning. (2007) Elman Backpropagation as Reinforcement for Simple Recurrent Networks. Neural Computation 19:11, 3108-3131
Online publication date: 1-Nov-2007.
Abstract | PDF (677 KB) | PDF Plus (688 KB) 
Peter Tiňo, Igor Farkaš, Jort van Mourik. (2006) Dynamics and Topographic Organization of Recursive Self-Organizing Maps. Neural Computation 18:10, 2529-2567
Online publication date: 1-Oct-2006.
Abstract | PDF (834 KB) | PDF Plus (567 KB) 
Henrik Jacobsson. (2006) The Crystallizing Substochastic Sequential Machine Extractor: CrySSMEx. Neural Computation 18:9, 2211-2255
Online publication date: 1-Sep-2006.
Abstract | PDF (329 KB) | PDF Plus (341 KB) 
Henrik Jacobsson. (2005) Rule Extraction from Recurrent Neural Networks: ATaxonomy and Review. Neural Computation 17:6, 1223-1263
Online publication date: 1-Jun-2005.
Abstract | PDF (349 KB) | PDF Plus (271 KB) 

Technology Partner - Atypon Systems, Inc.
  CrossRef member COUNTER member