Summary (1137107), страница 2
Текст из файла (страница 2)
The contribution of other coauthors is review of the codeof the experiments, technical help with setup of experiments, discussion ofthe obtained results, editing of the text of the papers, problem formulation andgeneral supervision of research.Publications and probation of the workThe aspirant is the main author in all papers on the topic of the thesis.First-tier publications.1. Figurnov M., Ibraimova A., Vetrov D. P., Kohli P. PerforatedCNNs:Acceleration through Elimination of Redundant Convolutions // Advances in Neural Information Processing Systems 29.
2016. P. 947–955. Rank A* conference, indexed by SCOPUS.2. Figurnov M., Collins M. D., Zhu Y., Zhang L., Huang J., Vetrov D.,Salakhutdinov R. Spatially Adaptive Computation Time for ResidualNetworks // The IEEE Conference on Computer Vision and PatternRecognition (CVPR). 2017. P. 1039–1048. Rank A* conference, indexed by SCOPUS.3. Figurnov M., Sobolev A., Vetrov D. Probabilistic adaptive computation time // Bulletin of the Polish Academy of Sciences: TechnicalSciences. 2018. Vol. 66, no.
6. P. 811–820. Journal indexed by Webof Science (Q2) and SCOPUS (Q3).Other publications.1. Figurnov M., Vetrov D. P., Kohli P. PerforatedCNNs: Accelerationthrough Elimination of Redundant Convolutions // International Conference on Learning Representations (ICLR) Workshop. 2016.Reports at conferences and seminars.1. Seminar of Bayesian methods research group, Moscow, 20 February2015.
Topic: “Acceleration of convolutional neural networks”.2. Christmas colloquium on computer vision, Skoltech, Moscow, 28February 2015. Topic: “PerforatedCNNs: Acceleration through Elimination of Redundant Convolutions”.3. Seminar of Institute of Information Transmission Problems of theRussian Academy of Sciences “Structure models and deep learning”,Moscow, 21 March 2016. Topic: “Acceleration of ConvolutionalNeural Networks through Elimination of Redundant Convolutions”.74. “International Conference on Learning Representations 2016”, workshop, San Juan, Puerto Rico, USA, 3 May 2016.
Topic: “PerforatedCNNs: Acceleration through Elimination of Redundant Convolutions”.5. “Conference on Neural Information Processing Systems 2016”, mainsection, Barcelona, Spain, 7 December 2016. Topic: “PerforatedCNNs: Acceleration through Elimination of Redundant Convolutions”.6. Seminar of OpenAI, San Francisco, California, USA, 1 March 2017.Topic: “Spatially Adaptive Computation Time for Residual Networks”.7. Seminar of Bayesian methods research group, Moscow, 10 March2017. Topic: “Spatially Adaptive Computation Time for ResidualNetworks”.8. International summit “Machines Can See”, Moscow, 9 June 2017.Topic: “Spatially Adaptive Computation Time for Residual Networks”.9.
“IEEE Conference on Computer Vision and Pattern Recognition2017”, main section, Honolulu, Hawaii, USA, 22 July 2017. Topic:“Spatially Adaptive Computation Time for Residual Networks”.10. Christmas colloquium on computer vision, Skoltech, Moscow, 26December 2017. Topic: “Spatially Adaptive Computation Time forResidual Networks”.Volume and structure of the work. The thesis contains an introduction,four chapters and a conclusion. The full volume of the thesis is 116 pages,including 30 figures and 7 tables. The list of references contains 167 items.Contents of the workIntroduction substantiates the relevance of the research presented in thisthesis; formulates the goals and tasks of the works; describes the scientificnovelty of the work and the main provisions for defense.First chapter is an overview that consists of two parts.
The first partpresents methods of deep learning, particularly convolutional neural networks(CNNs). The problems solved by CNNs are described, including image classification, image segmentation, object detection, etc. The supervised learningproblem, for which deep learning methods are currently most effective, is formulated. Stochastic optimization methods and backpropagation algorithm arepresented for tuning the parameters (training) of neural networks.
Parameterinitialization methods are also described, since the initialization significantly8affects the final results due to non-convexity of the objective. Then, the mostcommonly used neural network layers are described: fully-connected layer,various activation functions, dropout layer, softmax. Next, CNN-specific layers are considered: convolutional layer, pooling layer, batch normalization,etc. A historical reference on the ImageNet image classification contest is provided, as well as examples of convolutional neural architectures that show thebest performance in practice: AlexNet, VGG-16, residual network (ResNet).The second part of the chapter considers methods for training random variables parameters.
These methods are required for training neural networks withstochastic variables. Several methods are described: REINFORCE methodthat is applicable to a wide range of probability distributions but suffers fromhigh variance of gradients; reparameterization trick that is only applicable toa small set of continuous random variables but has low variance of gradients;Gumbel-Softmax relaxation that allows training of discrete random variablesparameters using reparameterization.Second chapter proposes CNN perforation method that acceleratesCNNs by reducing their spatial redundancy. The name of the method comesfrom the loop perforation method that accelerates programs by skipping someloop iterations.First, perforated convolutional layer is described. This is a modificationof the standard convolutional layer and has the same input, output and weighttensor dimensions.
The key hyperparameter of perforated convolutional layeris perforation mask, a subset of spatial positions of the convolutional layer’soutput. Perforation rate is the fraction of the spatial positions not in the perforation mask. The output values in the spatial positions from the perforationmask are computed exactly, meaning that they are equal to the correspondingvalues of the standard convolutional layer. The values in the other spatial positions are interpolated using the value from the nearest position that is computedexactly. Other interpolation methods are possible, such as replacing the missing values with zeros.
Perforated convolutional layer is a generalization of thestandard convolutional layer. Equivalency is achieved if the perforation maskscontains all spatial positions.Several methods for perforation mask generation (choice of a subset ofoutput spatial positions) are presented, see fig.
1. Uniform mask is obtainedby an equiprobable choice of positions without replacement. Its disadvantageis that the obtained points often form compact groups, increasing the averagedistance to the perforation mask points. Grid mask is Cartesian product of subsets of positions for each coordinate that are chosen using the pseudorandom9(a) Uniform(b) Grid(c) Structure(d) ImpactFigure 1: Examples of perforation masks.tensor Udata matrix Mtensor Vkernels Kim2row=×" 1′% ′1X′" Figure 2: Reduction of convolutional layer computation to matrix multiplication.integer sequence generation scheme [25]. If the number of positions evenly divides the size of the dimension, the mask forms a regular grid. Otherwise, thegrid contains irregularities.
Structure perforation mask contains positions thatare used in the following pooling layer most often. This mask is based on anobservation that for some pooling layer settings, e.g. for kernel size 3 × 3 andstride 2 (such settings are used in Network in Network and AlexNet models),various outputs of a convolutional layers are used different number of times.Finally, impact perforation mask considers the relative contribution of the spatial positions to the loss function. Denote by impact of a position at the outputof a convolutional layer for a given image a first-order Taylor approximationto the absolute change of loss function when the true value in this position isreplaced with zero.
Impact of all output positions can be efficiently computedusing the backpropagation algorithm. Impact of a spatial position is definedas a sum of impacts across channels, averaged over objects of the training set.Impact mask contains the spatial positions with the largest values of impact.For ImageNet dataset the impact mask has a center bias since the classifiedobject is usually centered. Also, a grid similar to the one obtained in structuremask is automatically inferred.1024222018161412108Figure 3: Object detections (left) and computation time map of the proposed methodSACT (right) for a validation object of COCO dataset. SACT method uses morecomputation for object-like regions of the image.An advantage of the perforated convolutional layer is that it can be effectively implemented: the reduction in computation time can be close to thereduction in the number of operations. To do that, we reduce evaluation of thelayer to matrix multiplication.
Specifically, subtensors corresponding to theoutput values from the perforation mask are cropped from the input tensor andplaced into rows of the data matrix, fig. 2. Multiplication of the data matrixby the kernel matrix provides the exact values of the convolutional layer in thespatial positions of the perforation mask. Interpolation of the missing values isperformed implicitly by indexing the read operations of the next layer. Becauseof this the method also reduces the memory consumption.The method is experimentally validated on the image classification problem.