Summary (1137107), страница 4

Файл №1137107 Summary (Вероятностный метод для адаптивного времени вычислений в нейронных сетях) 4 страницаSummary (1137107) страница 42019-05-202019-05-20СтудИзба

Вероятностный метод для адаптивного времени вычислений в нейронных сетях

Просмтор этого файла доступен только зарегистрированным пользователям. Но у нас супер быстрая регистрация: достаточно только электронной почты!

Регистрация/авторизация

Текст из файла (страница 4)

The iterations can be, for example,layers of a neural network. It is assumed that the outputs of iterations of ablock have the same dimensions. Depending on the specific type of the latent variables, the block can be discrete, thresholded or relaxed. The blocks ofdifferent types are compatible, meaning that the parameters of a model trainedwith one block type can be evaluated with another. After each iteration the halting probability, a number in [0, 1] range, is computed and used as parameterfor a latent variable. In discrete block a Bernoulli random variable is generatedafter each iteration. This variable is called a halting indicator. If it is equalto one, the computation is stopped.

In the thresholded block the computationis terminated when the halting probability exceeds 0.5. The relaxed block isobtained from the discrete one by replacing Bernoulli distribution with relaxedGumbel-Softmax distribution, fig. 5. In this case, the halting indicator takesvalues from the [0, 1] range. An output of the relaxed block is weighted average of the iterations outputs, where the weights are obtained by a stick-breakingprocess over the halting indicators. The model with the relaxed block can betrained using stochastic gradient descent by applying the reparameterizationtrick.Assume that a neural network contains several adaptive computationblocks with a latent variable encoding the number of iterations per block.

Foreach latent variable the prior distribution is chosen to be truncated Geometric distribution (the truncation is performed by the maximum number of iterations). Then, stochastic MAP inference is performed for the number of iterations, where the auxiliary distribution is defined by the halting indicators. Thefinal objective has two terms: log-likelihood of the correct answer averaged15over the auxiliary distribution and a linear penalty for the expected number ofiterations. This objective is analogous to the one obtained in the ACT model,but with the heuristic ponder cost replaced by the expected number of iterations.At the end of the section several examples of applying the proposedmethod to neural network architectures are described.

For residual networksa spatially adaptive version is introduced. Each spatial position of a residual network’s block is assigned an adaptive computation block. The adaptivecomputation iterations correspond to residual units. The obtained method is aprobabilistic counterpart of SACT. For the recurrent networks case, the construction is similar to the ACT method [24]: an adaptive computation block isused on every timestep to choose the number of network updates.The experimental validation of the methods is performed on ResNet32 and ResNet-110 models for CIFAR-10 classification problem. First, it isdemonstrated that the parameters of the relaxed model (the model using relaxed adaptive computation blocks) are compatible with discrete and thresholded models. To this end, during training of the relaxed model its parametersare tested in discrete and thresholded models.

The loss function, accuracy andthe number of operations stay close across models. Then, training of the relaxed model is compared to training of the discrete model using REINFORCEmethod. The number of latent variables is varied by combining the spatial positions into groups and assigning a single latent variable to each group.

It isshown that both training methods perform comparably for the number of latentvariables under one hundred. However, when the number of latent variables isincreased, REINFORCE does not allow to successfully train the model due toa high variance of gradients. Training with relaxation allows using up to 1344latent variables. The relaxed model and SACT method have a similar speedquality trade-off. An advantage of the probabilistic approach is that testing canbe performed in a thresholded mode that has an extremely simple implementation without a loss of quality.The main results of the work are presented in the conclusion:1.

A new method of convolutional neural network acceleration is developed. It is based on perforated convolutional layer that allows to spatially vary the amount of computation. It is demonstrated that the perforated convolutional layer can be efficiently implemented on bothCPU and GPU.

Several types of input-independent perforation masksare proposed and experimentally compared. The developed methodallows to make AlexNet and VGG-16 convolutional networks several16times faster. Reducing the spatial redundancy of convolutional neuralnetwork’s intermediate representations allows to improve the speedquality trade-off.2.

Adaptive computation time method which has previously been usedfor recurrent neural networks is applied to residual networks. The obtained method allows to vary the number of layers in residual networksdepending on the input object. Spatially adaptive computation timemethod is developed that allows to choose the number of layers perspatial position. It is proved that this method generalizes the previousone. Perforated convolutional layer with an input-dependent perforation mask is used for an efficient implementation of the method.The spatially-adaptive version is empirically shown to improve thespeed-quality trade-off of residual networks. The best results are obtained when processing high-resolution images. It is also shown thatthe computation time map can be used as a human saliency model.3.

Probabilistic model of adaptive computation time is proposed. Thismodel allows to adapt the number of layers in deep learning modelssuch as convolutional neural networks. A training method for thismodel is developed. It uses stochastic variational optimization andGumbel-Softmax discrete random variable relaxation. The originaladaptive computation time method is a heuristic relaxation of the proposed model. It is shown that the the proposed method achieves similar results to the adaptive computation time method, but has a simplerimplementation.

Thus, it is proved that probabilistic models can beused for varying the depth of convolutional neural networks.17Bibliography1.Bengio Y., Courville A., Vincent P. Representation learning: A review and newperspectives // IEEE transactions on pattern analysis and machine intelligence. —2013. — Vol. 35, no. 8. — P. 1798–1828.2.Lowe D. G.

Object recognition from local scale-invariant features // Conferenceon Computer Vision and Pattern Recognition. — 1999. — Vol. 2. — P. 1150–1157.3.Dalal N., Triggs B. Histograms of oriented gradients for human detection // Conference on Computer Vision and Pattern Recognition. — 2005. — Vol.

1. —P. 886–893.4.Murty K. S. R., Yegnanarayana B. Combining evidence from residual phase andMFCC features for speaker recognition // IEEE signal processing letters. —2006. — Vol. 13, no. 1. — P. 52–55.5.Furui S. 50 years of progress in speech and speaker recognition research // ECTITransactions on Computer and Information Technology (ECTI-CIT). — 2005. —Vol. 1, no. 2. — P. 64–74.6.ImageNet Large Scale Visual Recognition Challenge 2016 (ILSVRC2016) Results / http : / / image - net . org / challenges / LSVRC / 2016 /results. — 2016.7.LeCun Y., Bengio Y., Hinton G.

Deep learning // Nature. — 2015. — Vol. 521,no. 7553. — P. 436–444.8.LeCun Y., Boser B., Denker J. S., Henderson D., Howard R. E., Hubbard W.,Jackel L. D. Backpropagation applied to handwritten zip code recognition // Neural computation. — 1989. — Vol. 1, no. 4. — P. 541–551.9.Hochreiter S., Schmidhuber J. Long short-term memory // Neural computation.

— 1997. — Vol. 9, no. 8. — P. 1735–1780.10.Shazeer N., Mirhoseini A., Maziarz K., Davis A., Le Q., Hinton G., Dean J. Outrageously large neural networks: The sparsely-gated mixture-of-experts layer //International Conference on Learning Representations. — 2017.11. Krizhevsky A., Sutskever I., Hinton G. E.

Imagenet classification with deep convolutional neural networks // Advances in Neural Information Processing Systems. — 2012.12.He K., Zhang X., Ren S., Sun J. Deep Residual Learning for Image Recognition //Conference on Computer Vision and Pattern Recognition. — 2016.13.Yosinski J., Clune J., Nguyen A., Fuchs T., Lipson H. Understanding neural networks through deep visualization // ICML Deep Learning Workshop.

— 2015.14.Nguyen A., Dosovitskiy A., Yosinski J., Brox T., Clune J. Synthesizing the preferred inputs for neurons in neural networks via deep generator networks // Advances in Neural Information Processing Systems. — 2016. — P. 3387–3395.1815.Rensink R. A. The dynamic representation of scenes // Visual cognition. —2000. — Vol. 7, no. 1–3.16.Larochelle H., Hinton G. E. Learning to combine foveal glimpses with a thirdorder Boltzmann machine // Advances in Neural Information Processing Systems.

— 2010.17.Mnih V., Heess N., Graves A., [et al.]. Recurrent models of visual attention //Advances in Neural Information Processing Systems. — 2014.18.Ba J., Mnih V., Kavukcuoglu K. Multiple object recognition with visual attention // International Conference on Learning Representations. — 2015.19.Jaderberg M., Simonyan K., Zisserman A., Kavukcuoglu K. Spatial transformernetworks // Advances in Neural Information Processing Systems.

— 2015.20.Xu K., Ba J., Kiros R., Cho K., Courville A., Salakhutdinov R., Zemel R. S., Bengio Y. Show, attend and tell: Neural image caption generation with visual attention // International Conference on Machine Learning. — 2015.21.Sharma S., Kiros R., Salakhutdinov R. Action Recognition using Visual Attention // International Conference on Learning Representations Workshop. —2016.22.Bengio E., Bacon P.-L., Pineau J., Precup D. Conditional Computation in NeuralNetworks for faster models // International Conference on Learning Representations Workshop.

— 2016.23.Williams R. J. Simple statistical gradient-following algorithms for connectionistreinforcement learning // Machine learning. — 1992.24.Graves A. Adaptive Computation Time for Recurrent Neural Networks //arXiv. — 2016.25.Graham B. Fractional Max-Pooling // arXiv. — 2014.26.Ren S., He K., Girshick R., Sun J. Faster R-CNN: Towards real-time object detection with region proposal networks // Advances in Neural Information ProcessingSystems. — 2015.27.Staines J., Barber D. Variational Optimization // arXiv. — 2012.28.Staines J., Barber D. Optimization by Variational Bounding // ESANN. — 2013.19.

Характеристики

Тип файла

PDF-файл

Размер

620,07 Kb

Материал

Вероятностный метод для адаптивного времени вычислений в нейронных сетях

Тип материала

Кандидатская диссертация

Предмет

Технические науки

Высшее учебное заведение

НИУ ВШЭ

Список файлов диссертации

verojatnostnyj-metod-dlja-adaptivnogo-vremeni-vychislenij-v-nejronnyh-setjah.rar

Вероятностный метод для адаптивного времени вычислений в нейронных сетях

Поделитесь ссылкой:

Ставлю 10/10
Все нравится, очень удобный сайт, помогает в учебе. Кроме этого, можно заработать самому, выставляя готовые учебные материалы на продажу здесь. Рейтинги и отзывы на преподавателей очень помогают сориентироваться в начале нового семестра. Спасибо за такую функцию. Ставлю максимальную оценку.

Лучшая платформа для успешной сдачи сессии
Познакомился со СтудИзбой благодаря своему другу, очень нравится интерфейс, количество доступных файлов, цена, в общем, все прекрасно. Даже сам продаю какие-то свои работы.

Студизба ван лав ❤
Очень офигенный сайт для студентов. Много полезных учебных материалов. Пользуюсь студизбой с октября 2021 года. Серьёзных нареканий нет. Хотелось бы, что бы ввели подписочную модель и сделали материалы дешевле 300 рублей в рамках подписки бесплатными.

Отличный сайт
Лично меня всё устраивает - и покупка, и продажа; и цены, и возможность предпросмотра куска файла, и обилие бесплатных файлов (в подборках по авторам, читай, ВУЗам и факультетам). Есть определённые баги, но всё решаемо, да и администраторы реагируют в течение суток.

Маленький отзыв о большом помощнике!
Студизба спасает в те моменты, когда сроки горят, а работ накопилось достаточно. Довольно удобный сайт с простой навигацией и огромным количеством материалов.

Студ. Изба как крупнейший сборник работ для студентов
Тут дофига бывает всего полезного. Печально, что бывают предметы по которым даже одного бесплатного решения нет, но это скорее вопрос к студентам. В остальном всё здорово.

Спасательный островок
Если уже не успеваешь разобраться или застрял на каком-то задание поможет тебе быстро и недорого решить твою проблему.

Всё и так отлично
Всё очень удобно. Особенно круто, что есть система бонусов и можно выводить остатки денег. Очень много качественных бесплатных файлов.

Отзыв о системе "Студизба"
Отличная платформа для распространения работ, востребованных студентами. Хорошо налаженная и качественная работа сайта, огромная база заданий и аудитория.

Отличный помощник
Отличный сайт с кучей полезных файлов, позволяющий найти много методичек / учебников / отзывов о вузах и преподователях.

Отлично помогает студентам в любой момент для решения трудных и незамедлительных задач
Хотелось бы больше конкретной информации о преподавателях. А так в принципе хороший сайт, всегда им пользуюсь и ни разу не было желания прекратить. Хороший сайт для помощи студентам, удобный и приятный интерфейс. Из недостатков можно выделить только отсутствия небольшого количества файлов.

Спасибо за шикарный сайт
Великолепный сайт на котором студент за не большие деньги может найти помощь с дз, проектами курсовыми, лабораторными, а также узнать отзывы на преподавателей и бесплатно скачать пособия.