A Generalized Learning Approach to Deep Neural Networks
DOI:
https://doi.org/10.26636/jtit.2024.3.1454Keywords:
deep neural networks, machine learning, optimizationAbstract
Optimization of machine learning architectures is essential in determining the efficacy and the applicability of any neural architecture to real world problems. In this work a generalized Newton's method (GNM) is presented as a powerful approach to learning in deep neural networks (DNN). This technique was compared to two popular approaches, namely the stochastic gradient descent (SGD) and the Adam algorithm, in two popular classification tasks. The performance of the proposed approach confirmed it as an attractive alternative to state-of-the-art first order solutions.
Due to the good results presented in the case of shallow DNN, in the last part of the article an hybrid optimization method is presented. This method consists in combining two optimization algorithms, i.e. GNM and Adam or GNM and SGD, during the training phase within the layers of the neural network. This configuration aims to benefit from the strengths of both first- and second-order algorithms. In this case a convolutional neural network is considered and its parameters are updated with a different optimization algorithm. Also in this case, the hybrid approach returns the best performance with respect to the first order algorithms.
Downloads
References
Y. LeCun, Y. Bengio, and G. Hinton, "Deep Learning", Nature, vol. 521, pp. 436-444, 2015. DOI: https://doi.org/10.1038/nature14539
View in Google Scholar
I. Goodfellow, Y. Bengio, and A. Courville, Deep Learning, MIT Press, 2016 (http://www.deeplearningbook.org).
View in Google Scholar
Y. Bengio, Y. LeCun, and G. Hinton, "Deep Learning for AI", Communications of the ACM, vol. 64, no. 7, pp. 58-65, 2021. DOI: https://doi.org/10.1145/3448250
View in Google Scholar
C.M. Bishop, Neural Networks for Pattern Recognition, Oxford University Press, 502 p., 1996 (ISBN: 9780198538646). DOI: https://doi.org/10.1201/9781420050646.ptb6
View in Google Scholar
R.D. Reed and R.J. Marks, Neural Smithing: Supervised Learning in Feedforward Artificial Neural Networks, MIT Press, 1999. DOI: https://doi.org/10.7551/mitpress/4937.001.0001
View in Google Scholar
R. Parisi, E.D. Di Claudio, G. Orlandi, and B.D. Rao, "Fast Adaptive Digital Equalization by Recurrent Neural Networks", IEEE Transactions on Signal Processing, vol. 45, no. 11, pp. 2731-2739, 1997. DOI: https://doi.org/10.1109/78.650099
View in Google Scholar
R. Battiti, "First- and Second-order Methods for Learning: Between Steepest Descent and Newton’s Method", Neural Computation, vol. 4, no. 2, pp. 141-166, 1992. DOI: https://doi.org/10.1162/neco.1992.4.2.141
View in Google Scholar
L. Bottou, F.E. Curtis, and J. Nocedal, "Optimization Methods for Large-scale Machine Learning", SIAM Review, vol. 60, no. 2, pp. 223-311, 2018. DOI: https://doi.org/10.1137/16M1080173
View in Google Scholar
J. Nocedal and S.J. Wright, Numerical Optimization, Springer, 664 p., 2006.
View in Google Scholar
A.S. Berahas, M. Jahani, P. Richtárik, and M. Takác, "Quasi-Newton Methods for Machine Learning: Forget the Past, Just Sample", arXiv, 2019.
View in Google Scholar
D. Goldfarb, Y. Ren, and A. Bahamou, "Practical Quasi-Newton Methods for Training Deep Neural Networks", arXiv, 2020.
View in Google Scholar
A.S. Berahas, R. Bollapragada, and J. Nocedal, "An Investigation of Newton-Sketch and Subsampled Newton Methods", Optimization Methods and Software, vol. 35, no. 4, pp. 661-680, 2020. DOI: https://doi.org/10.1080/10556788.2020.1725751
View in Google Scholar
A.S. Berahas and M. Takác, "A Robust Multi-batch L-BFGS Method for Machine Learning", Optimization Methods and Software, vol. 35, no. 1, pp. 191-219, 2020. DOI: https://doi.org/10.1080/10556788.2019.1658107
View in Google Scholar
J.E. Dennis, Jr. and J.J. Moré, "Quasi-Newton Methods, Motivation and Theory", SIAM Review, vol. 19, no. 1, pp. 46-89, 1977. DOI: https://doi.org/10.1137/1019005
View in Google Scholar
Z. Yao et al., "ADAHESSIAN: An Adaptive Second Order Optimizer for Machine Learning", arXiv, 2020.
View in Google Scholar
R. Anil et al., "Scalable Second Order Optimization for Deep Learning", arXiv, 2020.
View in Google Scholar
J.D. Lee et al., "First-order Methods Almost Always Avoid Saddle Points", arXiv, 2017.
View in Google Scholar
R. Parisi, E.D. Di Claudio, G. Orlandi, and B.D. Rao, "A Generalized Learning Paradigm Exploiting the Structure of Feedforward Neural Networks", IEEE Transactions on Neural Networks, vol. 7, no. 6, pp. 1450-1460, 1996. DOI: https://doi.org/10.1109/72.548172
View in Google Scholar
S. Ruder, "An Overview of Gradient Descent Optimization Algorithms", arXiv, 2016.
View in Google Scholar
H. Robbins and S. Monro, "A Stochastic Approximation Method", The Annals of Mathematical Statistics, vol. 22, no. 3, pp. 400-407, 1951. DOI: https://doi.org/10.1214/aoms/1177729586
View in Google Scholar
R. Rojas, Neural Networks. A Systematic Introduction, Springer, 504 p., 2006.
View in Google Scholar
D.E. Rumelhart and J.L. McClelland, "Learning Internal Representations by Error Propagation", in: Parallel Distributed Processing: Explorations in the Microstructure of Cognition: Foundations, MIT Press, pp. 318-362, 1987 (ISBN: 9780262291408). DOI: https://doi.org/10.7551/mitpress/5236.001.0001
View in Google Scholar
D.P. Kingma and J. Ba, "Adam: A Method for Stochastic Optimization", arXiv, 2014.
View in Google Scholar
J.D. Lee et al., "Basic Classification: Classify Images of Clothing" (https://www.tensorflow.org/tutorials/keras/classification).
View in Google Scholar
Y. LeCun, C. Cortes, and C.J.C. Burges, The MNIST Database of Handwritten Digits, 2012 (http://yann.lecun.com/exdb/mnist/).
View in Google Scholar
M. Abadi et al., "TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems", arXiv, 2016.
View in Google Scholar
P. Baldi, P. Sadowski, and D. Whiteson, "Searching for Exotic Particles in High-energy Physics with Deep Learning", Nature Communications, vol. 5, art. no. 4308, 2014. DOI: https://doi.org/10.1038/ncomms5308
View in Google Scholar
D.-Y. Ge et al., "Design of High Accuracy Detector for MNIST Handwritten Digit Recognition Based on Convolutional Neural Network", 2019 12th International Conference on Intelligent Computation Technology and Automation (ICICTA), Xiangtan, China, 2019.
View in Google Scholar
Downloads
Published
Issue
Section
License
Copyright (c) 2024 Francesca Ponti, Raffaele Parisi, Fabrizio Frezza, Patrizio Simeoni

This work is licensed under a Creative Commons Attribution 4.0 International License.