Summary (1137107), страница 3

Файл №1137107 Summary (Вероятностный метод для адаптивного времени вычислений в нейронных сетях) 3 страницаSummary (1137107) страница 32019-05-202019-05-20СтудИзба

Вероятностный метод для адаптивного времени вычислений в нейронных сетях

Просмтор этого файла доступен только зарегистрированным пользователям. Но у нас супер быстрая регистрация: достаточно только электронной почты!

Регистрация/авторизация

Текст из файла (страница 3)

First, the proposed perforation mask types are compared on the problemof acceleration of a single convolutional layer of AlexNet network (ImageNetdataset). The best-performing masks are grid and impact. Then, using the Network in Network architecture (CIFAR-10 dataset) it is shown that perforationoutperforms simple baselines, such as increasing the convolutional layer strideand decreasing the input image resolution, in terms of speed-quality trade-off.Finally, perforation is used for acceleration of the whole AlexNet and VGG-16networks for ImageNet dataset, illustrating the applicability of perforation toacceleration of large CNNs.

The considered networks may be accelerated bya factor of two on CPU and GPU with an error increase of at most 2.6%. Thetheoretical speedups are usually close to the empirical ones, proving the effectiveness of the proposed implementation of perforated convolutional layer.Spatially adaptive computation time (SACT) method for residual networks is introduced in the third chapter.

This method allows to focus thecomputation on important regions of an image, fig. 3.11image conv+poolresidual units56x56x256 28x28x512224x224x3block 1block 2pool + fc14x14x10247x7x2048block 3block 4Figure 4: Spatially adaptive computation time method for a residual networkResNet-101.First, adaptive computation time (ACT) method [24] that has been previously proposed for RNNs is applied to residual networks, a popular CNNarchitecture.

A residual network consists of residual units, functions of theform U l = U l−1 + f (U l−1 ), where f (U l−1 ) is a convolutional neural subnetwork called the residual function. A sequence of residual units with equaloutput dimensions is called a residual block. In the ACT method, each residualunit additionally outputs a halting probability, a number from the [0, 1] range.Residual units and their probabilities are computed sequentially.

As soon asthe cumulative sum of the halting probabilities reaches one, all the followingresidual units in the current block are skipped. The halting probability distribution is defined as the computed halting probabilities, where the last value isreplaced by remainder. The value of remainder is determined from the normalization condition of the probability distribution. The output of the block isdefined as weighted average of the residual units outputs, where the weightsare the corresponding halting probabilities.

Finally, ponder cost is the sum ofthe number of computed residual units and the remainder. Minimization of theponder cost increases the halting probability of all the modules, except for thelast one, which leads to an earlier stopping. Ponder cost is used a regularizerfor the original loss function.

The described method is applied to every residualblock of the network independently with the ponder costs added up. Thus, inevery block only the first several modules are executed, fig. 4. We prove thatthe ACT method generalizes the residual network model.The idea of the proposed SACT method is that ACT is applied to everyspatial position of a block.

Then, every position has its own halting probability.The output of the block in a spatial position is defined as a weighted averageof the residual units outputs of the block in this position. A spatial position ofa residual module is called active if its cumulative halting probability does notexceed one. It is clear that only the values in the active positions affect the out12puts of the block, so it is reasonable to only compute these values.

The valuesof the residual function in the inactive positions are imputed by zero, which isequivalent to copying the previous values. A residual unit that is only evaluatedin the active positions can be implemented efficiently using perforated convolutional layer proposed in the second chapter, with the missing values imputedby zeros. We show that the SACT method generalizes the ACT method andtherefore the residual network.ACT and SACT methods are experimentally validated by applying to theresidual network ResNet-101. Dead residual unit problem occurring in ACTand SACT is described: if the model is initialized “incorrectly”, the last residualunits of the blocks are never used. Several initialization heuristics are proposedto solve this problem.First, image classification problem is considered (ImageNet dataset).Non-adaptive residual network with a comparable number of floating pointoperations is used as a baseline for the ACT and SACT methods.

When thetest-time resolution is increased, which is a standard practice, SACT outperforms ACT and baselines in terms of the trade-off of the quality and numberof operations. It is demonstrated that the advantage of SACT persists if thetraining-time resolution is increased.Then, object detection problem is considered for the COCO dataset. Inthis case, high resolution images are used, such as 1000 × 600, which is significantly larger that the standard 224 × 224 for ImageNet classification. SACTmechanism allows to reduce the computation time for the low informationbackground. Faster R-CNN [26] object detection method is used, with the feature extraction residual network replaced by SACT.

This approach outperformsthe baseline of using non-adaptive ResNet for feature extraction in terms of thetrade-off between speed and mean average precision (mAP). An example ofthe resulting detections and computation time map is presented on fig. 3.Finally, experiments showing that the computation time maps of SACTcorrelate well with the visual saliency are presented. This is done on a largecat2000 dataset that is obtained by showing images to people and measuringthe eye fixation positions. The target map is a smoothed histogram of the positions. SACT models pre-trained on ImageNet and COCO are used, and no finetuning on the saliency prediction problem is performed.

However, a parametricpost-processing of the computation time maps is done that smoothes them andaccounts for the center bias of the cat2000 target maps. The post-processedmaps outperform a baseline, a centered Gaussian. Thus, SACT automatically13learns to focus the computation on the regions that people consider important,therefore improving the interpretability of CNNs.Probabilistic adaptive computation time method is proposed inthe fourth chapter.

It is based on a probabilistic model where the discretelatent variables define the number of the executed iterations. ACT method isa heuristic relaxation of the proposed probabilistic model that has a significantdownside of having a discontinuous loss function. This means that the ACTmethod cannot be used jointly with the reparameterization trick that requires asmooth loss function.At the beginning of the chapter a mathematical framework for stochasticMAP-inference in discriminative probabilistic models is developed.

First, wedescribe the variational optimization method [27; 28] for maximization of afunction f (z) of a discrete or continuous variable z. This method is based onthe variational boundL(ϕ) = E f (z) ≤ E max f (z) = max f (z)q(z|ϕ)q(z|ϕ)zz(1)that is valid for any auxiliary distribution q(z|ϕ). The inequality becomes tightwhen the auxiliary distribution is a delta-function at the argmax of f (z). Assume that the quantity L(ϕ) can be computed with an acceptable computationalcost. Then, one can maximize L(ϕ) using gradient-based optimization methods.For analytically intractable or too computationally expensive functionL(ϕ) we propose a new method of stochastic variational optimization.

Forreparameterizable distribution q(z|ϕ) we propose to do reparameterization. Fordiscrete distribution q(z|ϕ) two options are proposed: using REINFORCEmethod, or applying Gumbel-Softmax relaxation and training with the reparameterization trick. In any of these cases it becomes possible to compute astochastic gradient of the loss function.Consider a discriminative probabilistic model p(y, z|x) = p(y|x, z)p(z),where x is object, y is target label and z is latent variable.

Here p(y|x, z) denotes the likelihood of the target label given the object and the latent variablethat can be defined, for instance, with a neural network. MAP inference problem consists of finding a value of the latent variable z ∗ that maximizes theposterior distribution p(z|x, y) = p(y,z|x)p(y|x) . To solve this problem one can usevariational optimization with an auxiliary distribution q(z|x, ϕ) that does notdepend on the true label, allowing to use it at the test time.Then, probabilistic method for adaptive computation time is proposed.Adaptive computation block is a computational module that chooses the num14output of the blocklatent variablehalting probability0.10.720.180.10.810.20.51iteration outputFigure 5: Relaxed adaptive computation block.ber of iterations depending on the input.

Характеристики

Тип файла

PDF-файл

Размер

620,07 Kb

Материал

Вероятностный метод для адаптивного времени вычислений в нейронных сетях

Тип материала

Кандидатская диссертация

Предмет

Технические науки

Высшее учебное заведение

НИУ ВШЭ

Список файлов диссертации

verojatnostnyj-metod-dlja-adaptivnogo-vremeni-vychislenij-v-nejronnyh-setjah.rar

Вероятностный метод для адаптивного времени вычислений в нейронных сетях

Поделитесь ссылкой:

Ставлю 10/10
Все нравится, очень удобный сайт, помогает в учебе. Кроме этого, можно заработать самому, выставляя готовые учебные материалы на продажу здесь. Рейтинги и отзывы на преподавателей очень помогают сориентироваться в начале нового семестра. Спасибо за такую функцию. Ставлю максимальную оценку.

Лучшая платформа для успешной сдачи сессии
Познакомился со СтудИзбой благодаря своему другу, очень нравится интерфейс, количество доступных файлов, цена, в общем, все прекрасно. Даже сам продаю какие-то свои работы.

Студизба ван лав ❤
Очень офигенный сайт для студентов. Много полезных учебных материалов. Пользуюсь студизбой с октября 2021 года. Серьёзных нареканий нет. Хотелось бы, что бы ввели подписочную модель и сделали материалы дешевле 300 рублей в рамках подписки бесплатными.

Отличный сайт
Лично меня всё устраивает - и покупка, и продажа; и цены, и возможность предпросмотра куска файла, и обилие бесплатных файлов (в подборках по авторам, читай, ВУЗам и факультетам). Есть определённые баги, но всё решаемо, да и администраторы реагируют в течение суток.

Маленький отзыв о большом помощнике!
Студизба спасает в те моменты, когда сроки горят, а работ накопилось достаточно. Довольно удобный сайт с простой навигацией и огромным количеством материалов.

Студ. Изба как крупнейший сборник работ для студентов
Тут дофига бывает всего полезного. Печально, что бывают предметы по которым даже одного бесплатного решения нет, но это скорее вопрос к студентам. В остальном всё здорово.

Спасательный островок
Если уже не успеваешь разобраться или застрял на каком-то задание поможет тебе быстро и недорого решить твою проблему.

Всё и так отлично
Всё очень удобно. Особенно круто, что есть система бонусов и можно выводить остатки денег. Очень много качественных бесплатных файлов.

Отзыв о системе "Студизба"
Отличная платформа для распространения работ, востребованных студентами. Хорошо налаженная и качественная работа сайта, огромная база заданий и аудитория.

Отличный помощник
Отличный сайт с кучей полезных файлов, позволяющий найти много методичек / учебников / отзывов о вузах и преподователях.

Отлично помогает студентам в любой момент для решения трудных и незамедлительных задач
Хотелось бы больше конкретной информации о преподавателях. А так в принципе хороший сайт, всегда им пользуюсь и ни разу не было желания прекратить. Хороший сайт для помощи студентам, удобный и приятный интерфейс. Из недостатков можно выделить только отсутствия небольшого количества файлов.

Спасибо за шикарный сайт
Великолепный сайт на котором студент за не большие деньги может найти помощь с дз, проектами курсовыми, лабораторными, а также узнать отзывы на преподавателей и бесплатно скачать пособия.