Summary (1137107)

Файл №1137107 Summary (Вероятностный метод для адаптивного времени вычислений в нейронных сетях)Summary (1137107)2019-05-202019-05-20СтудИзба

Вероятностный метод для адаптивного времени вычислений в нейронных сетях

Просмтор этого файла доступен только зарегистрированным пользователям. Но у нас супер быстрая регистрация: достаточно только электронной почты!

Регистрация/авторизация

Текст из файла

as a manuscriptMikhail FigurnovPROBABILISTIC METHOD OF ADAPTIVE COMPUTATIONTIME IN NEURAL NETWORKSSUMMARYof a dissertation to obtain the degree ofDoctor of Philosophy in Computer Science HSEMoscow — 2019The PhD Dissertation was prepared at National Research University HigherSchool of Economics.Academic Supervisor:Dmitry P. Vetrov, Candidate of Sciences, ResearchProfessor, National Research University HigherSchool of Economics.Topic of the thesisIn this work we present a probabilistic method for spatial adaptation ofcomputation time in convolutional neural network, a popular computer visionmodel.

This method improves computational efficiency and interpretability.Actuality of the work. In recent years the amount of data collectedworldwide is rapidly growing. This increases the importance of machine learning methods that allow to automatically extract patterns from data. In machinelearning tasks it is assumed that the real world objects are described by features.

There also exists a training set obtained from the general set of objects.In supervised learning problem the true labels for the training set objects arealso known, and the problem is to restore the dependence of the labels fromthe features. The quality of the obtained solution is usually estimated by accuracy, the fraction of correctly determined labels on the test set.

An alternativeto this approach is unsupervised learning where the training set consists onlyof the object features. The goal of unsupervised learning is to obtain a compactand informative description of objects that can later be used, for instance, insupervised learning on a smaller labeled set [1].A common way of solving the aforementioned machine learning problems is probabilistic modeling. For supervised learning case, the probabilisticmodel defines a distribution of the labels given the observed features.

For unsupervised learning, a common approach is to introduce latent (unobserved)variables that define the factors of variation of the data. The parameters of theprobabilistic model are fitted using the maximum likelihood method using thetraining set and gradient-based optimization methods. In many latent variablemodels the likelihood cannot be computed analytically. In this case variationalmethods are often employed.The success of machine learning methods crucially depends on the informativeness of the feature representation. Some of the most complex objectsto represent by features are high-dimensional unstructured objects: images,sounds, texts, graphs, etc.

The volume of such data rapidly grows due to thespread of internet and social networks. By the beginning of the 2010s, methods based on expert knowledge of subject areas were developed for such data.For example, in image processing SIFT [2] and HOG [3] features were widelyused; in sound processing MFCC [4] features were popular. Unfortunately,the informativeness of such features remained unsatisfactory for solving realworld problems. The lack of obvious ways to improve those features led tostagnation in the quality of the methods [5; 6].3In the last five years deep learning has become the most effective way ofworking with high-dimensional unstructured data [7].

Deep learning suggestsusing multi-layer (deep) feature representations of objects defined by neuralnetworks with dozens or even hundreds of layers. The architecture of the neuralnetwork is determined based on the properties of the data. For example, convolutional neural networks (CNNs) [8] are often used for image processing, whilerecurrent neural networks (RNNs) [9] are widely employed for sound and textrecognition. The last layer of the neural network usually corresponds to an answer to the problem being solved, e.g.

a probability distribution for the labels.The parameters of the model, numbering up to billions [10], are trained usingstochastic gradient-based optimization methods that maximize the likelihoodof the probabilistic model. Thus, deep learning considers parametric modelschosen based on the properties of the data and relatively simple training methods.The key reasons for success of deep learning are creation of very largelabeled training sets such as ImageNet [6] and development of new computing technology, most notably the graphic processing units (GPUs).

In 2012 ateam from Toronto successfully trained a CNN for image classification problem [11]. They were able to dramatically improve the quality of solution compared to all previous approaches, none of which used neural networks. Afterthat CNNs became a crucial element of computer vision systems.

The adoption of CNNs allowed to significantly advance scene understanding (patternrecognition) problems such as image classification, object identification, object detection and semantic segmentation. Furthermore, it turned out that improvement of quality can be achieved by increasing the amount of computation,first of all by increasing the depth (number of layers) of CNN. For instance, theabovementioned CNN from the year 2012 consists of 8 layers, while a residualnetwork proposed in 2015 has 152 layers [12].Despite the breakthrough in quality of solution, the CNN model has several downsides:1. CNNs have huge computational cost which is mostly caused by convolutional layers that take up over 80% of the computation time. ModernCNNs use tens of billions floating point operations to process a singleimage. Such computational demands limit applicability of CNNs inmany scenarios, including real-time video processing, as well as deployment on devices without powerful GPUs and in devices where thepower supply is limited.42.

CNNs are hard to interpret. The complex structure of the models,large number of parameters and computation mean that the classicalmodel analysis techniques are not applicable to CNNs. Because of thisreason, it is problematic to use CNNs in high cost of error scenarioswhere the decisions of the system need to be validated by an expert.There exist several methods for interpretation of previously trainedCNNs [13; 14]. An important problem is development of more interpretable CNNs.In the thesis we approach these problems by using a hypothesis that CNNsare spatially redundant, meaning that application of some of the layers in someof the spatial positions is not necessary to obtain a high quality solution.

Therefore, a method that allows skipping convolutional layers in some spatial positions would improve the speed-quality trade-off for CNN. Moreover, if theskipped spatial positions are chosen based on an object, the obtained computation time maps improve the interpretability of CNNs: the regions that havemore computation are more important for the problem at hand. This mechanismis similar to how biological vision systems spend more time on the importantregions of the presented image [15].The spatial adaptivity of computation time can be seen as an attentionmodel. Currently existing attention models for CNNs have significant drawbacks. Glimpse-based attention models [16–19] cannot be applied to manyclasses of problems (object detection, image segmentation, image generation);soft spatial attention models [20; 21] do not allow reducing the amount ofcomputation; hard attention models [20; 22] are tuned using REINFORCEmethod [23] that makes training significantly harder.The goal of this work is to develop a method that improves the speedquality trade-off in CNNs.To achieve this goal the following problems are solved in this thesis:1.

Perforated convolutional layer is proposed that allows to spatially adjust and lower the amount of computation.2. Adaptive computation time method [24], that was previously proposedfor RNNs, is applied to spatial adjustment of the depth (number oflayers) of a CNN for a given object.3. A probabilistic model that adjusts the spatial depth of CNNs is proposed, as well as a method to train this model.5Key results and conclusionsThe novelty of this work is that for the first time the following points areshown:1. Reduction of spatial redundancy of intermediate representations of anetwork allows to increase the speed of CNN.2.

Spatial adaptation of the depth (number of layers) of a CNN based onthe object improves the relationship between the speed and quality ofCNN and also improves the interpretability of the model.3. Variation of the depth of a CNN can be performed by a probabilisticmodel with latent variables.Theoretical and practical significance. The obtained results widen thescope of applicability of CNNs by improving the speed-quality trade-off andimproving the interpretability.Methodology and research methods.

This work uses the methodologyof deep learning; the toolkit of probabilistic modeling; Python, CUDA, MATLAB programming languages; NumPy, MatConvNet, TensorFlow frameworks.Reliability of the obtained results is ensured by a detailed descriptionof the methods and algorithms, proofs of theorems, as well as description ofexperiments and release of the source code which facilitates reproducibility.Main provisions for defense:1.

Method for perforation of convolutional networks that allows to spatially adjust the amount of computation in a CNN.2. Method for spatially adaptive computation time for adjustment of thedepth (number of layers) of a CNN based on the object and the spatialposition.3. Probabilistic latent variable model for adaptation of CNN depth, aswell as a method of stochastic variational optimization for training ofthis model.4.

Experimental evaluation of the proposed methods, including a comparison with analogous solutions.Personal contribution into the main provisions for defense. The results are personally obtained by the author. In the works on the topic of thethesis the author proposed the key scientific ideas, implemented and performed the experiments, wrote the papers. The results of subsection 4.4 ofthe paper “PerforatedCNNs: Acceleration through Elimination of RedundantConvolutions” (NIPS 2016) were obtained by Aizhan Ibraimova and not in6cluded in the thesis.

Характеристики

Тип файла

PDF-файл

Размер

620,07 Kb

Материал

Вероятностный метод для адаптивного времени вычислений в нейронных сетях

Тип материала

Кандидатская диссертация

Предмет

Технические науки

Высшее учебное заведение

НИУ ВШЭ

Тип файла PDF

PDF-формат наиболее широко используется для просмотра любого типа файлов на любом устройстве. В него можно сохранить документ, таблицы, презентацию, текст, чертежи, вычисления, графики и всё остальное, что можно показать на экране любого устройства. Именно его лучше всего использовать для печати.

Например, если Вам нужно распечатать чертёж из автокада, Вы сохраните чертёж на флешку, но будет ли автокад в пункте печати? А если будет, то нужная версия с нужными библиотеками? Именно для этого и нужен формат PDF - в нём точно будет показано верно вне зависимости от того, в какой программе создали PDF-файл и есть ли нужная программа для его просмотра.

Список файлов диссертации

verojatnostnyj-metod-dlja-adaptivnogo-vremeni-vychislenij-v-nejronnyh-setjah.rar

Вероятностный метод для адаптивного времени вычислений в нейронных сетях

Поделитесь ссылкой:

Ставлю 10/10
Все нравится, очень удобный сайт, помогает в учебе. Кроме этого, можно заработать самому, выставляя готовые учебные материалы на продажу здесь. Рейтинги и отзывы на преподавателей очень помогают сориентироваться в начале нового семестра. Спасибо за такую функцию. Ставлю максимальную оценку.

Лучшая платформа для успешной сдачи сессии
Познакомился со СтудИзбой благодаря своему другу, очень нравится интерфейс, количество доступных файлов, цена, в общем, все прекрасно. Даже сам продаю какие-то свои работы.

Студизба ван лав ❤
Очень офигенный сайт для студентов. Много полезных учебных материалов. Пользуюсь студизбой с октября 2021 года. Серьёзных нареканий нет. Хотелось бы, что бы ввели подписочную модель и сделали материалы дешевле 300 рублей в рамках подписки бесплатными.

Отличный сайт
Лично меня всё устраивает - и покупка, и продажа; и цены, и возможность предпросмотра куска файла, и обилие бесплатных файлов (в подборках по авторам, читай, ВУЗам и факультетам). Есть определённые баги, но всё решаемо, да и администраторы реагируют в течение суток.

Маленький отзыв о большом помощнике!
Студизба спасает в те моменты, когда сроки горят, а работ накопилось достаточно. Довольно удобный сайт с простой навигацией и огромным количеством материалов.

Студ. Изба как крупнейший сборник работ для студентов
Тут дофига бывает всего полезного. Печально, что бывают предметы по которым даже одного бесплатного решения нет, но это скорее вопрос к студентам. В остальном всё здорово.

Спасательный островок
Если уже не успеваешь разобраться или застрял на каком-то задание поможет тебе быстро и недорого решить твою проблему.

Всё и так отлично
Всё очень удобно. Особенно круто, что есть система бонусов и можно выводить остатки денег. Очень много качественных бесплатных файлов.

Отзыв о системе "Студизба"
Отличная платформа для распространения работ, востребованных студентами. Хорошо налаженная и качественная работа сайта, огромная база заданий и аудитория.

Отличный помощник
Отличный сайт с кучей полезных файлов, позволяющий найти много методичек / учебников / отзывов о вузах и преподователях.

Отлично помогает студентам в любой момент для решения трудных и незамедлительных задач
Хотелось бы больше конкретной информации о преподавателях. А так в принципе хороший сайт, всегда им пользуюсь и ни разу не было желания прекратить. Хороший сайт для помощи студентам, удобный и приятный интерфейс. Из недостатков можно выделить только отсутствия небольшого количества файлов.

Спасибо за шикарный сайт
Великолепный сайт на котором студент за не большие деньги может найти помощь с дз, проектами курсовыми, лабораторными, а также узнать отзывы на преподавателей и бесплатно скачать пособия.