Bishop C.M. Pattern Recognition and Machine Learning (2006) (811375), страница 100

Файл №811375 Bishop C.M. Pattern Recognition and Machine Learning (2006) (Bishop C.M. Pattern Recognition and Machine Learning (2006).pdf) 100 страницаBishop C.M. Pattern Recognition and Machine Learning (2006) (811375) страница 1002020-08-252020-08-25СтудИзба

Bishop C.M. Pattern Recognition and Machine Learning (2006).pdf

Просмтор этого файла доступен только зарегистрированным пользователям. Но у нас супер быстрая регистрация: достаточно только электронной почты!

Регистрация/авторизация

Текст из файла (страница 100)

The initial choices for centres µ1 and µ2 are shownby the red and blue crosses, respectively. (b) In the initial E step, each data point is assigned either to the redcluster or to the blue cluster, according to which cluster centre is nearer. This is equivalent to classifying thepoints according to which side of the perpendicular bisector of the two cluster centres, shown by the magentaline, they lie on. (c) In the subsequent M step, each cluster centre is re-computed to be the mean of the pointsassigned to the corresponding cluster.

(d)–(i) show successive E and M steps through to ﬁnal convergence ofthe algorithm.9.1. K-means ClusteringFigure 9.2Plot of the cost function J given by(9.1) after each E step (blue points) 1000and M step (red points) of the Kmeans algorithm for the exampleshown in Figure 9.1. The algo- Jrithm has converged after the thirdM step, and the ﬁnal EM cycle produces no changes in either the as- 500signments or the prototype vectors.0Section 9.2.2Section 2.3.5Exercise 9.24271234case, the assignment of each data point to the nearest cluster centre is equivalent to aclassiﬁcation of the data points according to which side they lie of the perpendicularbisector of the two cluster centres. A plot of the cost function J given by (9.1) forthe Old Faithful example is shown in Figure 9.2.Note that we have deliberately chosen poor initial values for the cluster centresso that the algorithm takes several steps before convergence.

In practice, a betterinitialization procedure would be to choose the cluster centres µk to be equal to arandom subset of K data points. It is also worth noting that the K-means algorithmitself is often used to initialize the parameters in a Gaussian mixture model beforeapplying the EM algorithm.A direct implementation of the K-means algorithm as discussed here can berelatively slow, because in each E step it is necessary to compute the Euclidean distance between every prototype vector and every data point. Various schemes havebeen proposed for speeding up the K-means algorithm, some of which are based onprecomputing a data structure such as a tree such that nearby points are in the samesubtree (Ramasubramanian and Paliwal, 1990; Moore, 2000). Other approachesmake use of the triangle inequality for distances, thereby avoiding unnecessary distance calculations (Hodgson, 1998; Elkan, 2003).So far, we have considered a batch version of K-means in which the whole dataset is used together to update the prototype vectors.

We can also derive an on-linestochastic algorithm (MacQueen, 1967) by applying the Robbins-Monro procedureto the problem of ﬁnding the roots of the regression function given by the derivativesof J in (9.1) with respect to µk . This leads to a sequential update in which, for eachdata point xn in turn, we update the nearest prototype µk usingoldµnew= µoldkk + ηn (xn − µk )(9.5)where ηn is the learning rate parameter, which is typically made to decrease monotonically as more data points are considered.The K-means algorithm is based on the use of squared Euclidean distance as themeasure of dissimilarity between a data point and a prototype vector. Not only doesthis limit the type of data variables that can be considered (it would be inappropriatefor cases where some or all of the variables represent categorical labels for instance),4289.

MIXTURE MODELS AND EMSection 2.3.7but it can also make the determination of the cluster means nonrobust to outliers. Wecan generalize the K-means algorithm by introducing a more general dissimilaritymeasure V(x, x ) between two vectors x and x and then minimizing the followingdistortion measureN KJ=rnk V(xn , µk )(9.6)n=1 k=1which gives the K-medoids algorithm. The E step again involves, for given clusterprototypes µk , assigning each data point to the cluster for which the dissimilarity tothe corresponding prototype is smallest.

The computational cost of this is O(KN ),as is the case for the standard K-means algorithm. For a general choice of dissimilarity measure, the M step is potentially more complex than for K-means, and so itis common to restrict each cluster prototype to be equal to one of the data vectors assigned to that cluster, as this allows the algorithm to be implemented for any choiceof dissimilarity measure V(·, ·) so long as it can be readily evaluated. Thus the Mstep involves, for each cluster k, a discrete search over the Nk points assigned to thatcluster, which requires O(Nk2 ) evaluations of V(·, ·).One notable feature of the K-means algorithm is that at each iteration, everydata point is assigned uniquely to one, and only one, of the clusters.

Whereas somedata points will be much closer to a particular centre µk than to any other centre,there may be other data points that lie roughly midway between cluster centres. Inthe latter case, it is not clear that the hard assignment to the nearest cluster is themost appropriate. We shall see in the next section that by adopting a probabilisticapproach, we obtain ‘soft’ assignments of data points to clusters in a way that reﬂectsthe level of uncertainty over the most appropriate assignment. This probabilisticformulation brings with it numerous beneﬁts.9.1.1 Image segmentation and compressionAs an illustration of the application of the K-means algorithm, we considerthe related problems of image segmentation and image compression. The goal ofsegmentation is to partition an image into regions each of which has a reasonablyhomogeneous visual appearance or which corresponds to objects or parts of objects(Forsyth and Ponce, 2003). Each pixel in an image is a point in a 3-dimensional spacecomprising the intensities of the red, blue, and green channels, and our segmentationalgorithm simply treats each pixel in the image as a separate data point.

Note thatstrictly this space is not Euclidean because the channel intensities are bounded bythe interval [0, 1]. Nevertheless, we can apply the K-means algorithm without difﬁculty. We illustrate the result of running K-means to convergence, for any particularvalue of K, by re-drawing the image replacing each pixel vector with the {R, G, B}intensity triplet given by the centre µk to which that pixel has been assigned. Resultsfor various values of K are shown in Figure 9.3. We see that for a given value of K,the algorithm is representing the image using a palette of only K colours. It shouldbe emphasized that this use of K-means is not a particularly sophisticated approachto image segmentation, not least because it takes no account of the spatial proximityof different pixels. The image segmentation problem is in general extremely difﬁcult9.1.

K-means ClusteringK =2K =3K = 10429Original imageFigure 9.3 Two examples of the application of the K-means clustering algorithm to image segmentation showing the initial images together with their K-means segmentations obtained using various values of K. Thisalso illustrates of the use of vector quantization for data compression, in which smaller values of K give highercompression at the expense of poorer image quality.and remains the subject of active research and is introduced here simply to illustratethe behaviour of the K-means algorithm.We can also use the result of a clustering algorithm to perform data compression. It is important to distinguish between lossless data compression, in whichthe goal is to be able to reconstruct the original data exactly from the compressedrepresentation, and lossy data compression, in which we accept some errors in thereconstruction in return for higher levels of compression than can be achieved in thelossless case.

We can apply the K-means algorithm to the problem of lossy datacompression as follows. For each of the N data points, we store only the identityk of the cluster to which it is assigned. We also store the values of the K cluster centres µk , which typically requires signiﬁcantly less data, provided we chooseK N . Each data point is then approximated by its nearest centre µk . New datapoints can similarly be compressed by ﬁrst ﬁnding the nearest µk and then storingthe label k instead of the original data vector.

This framework is often called vectorquantization, and the vectors µk are called code-book vectors.4309. MIXTURE MODELS AND EMThe image segmentation problem discussed above also provides an illustrationof the use of clustering for data compression. Suppose the original image has Npixels comprising {R, G, B} values each of which is stored with 8 bits of precision.Then to transmit the whole image directly would cost 24N bits. Now suppose weﬁrst run K-means on the image data, and then instead of transmitting the originalpixel intensity vectors we transmit the identity of the nearest vector µk . Becausethere are K such vectors, this requires log2 K bits per pixel. We must also transmitthe K code book vectors µk , which requires 24K bits, and so the total number ofbits required to transmit the image is 24K + N log2 K (rounding up to the nearestinteger).

The original image shown in Figure 9.3 has 240 × 180 = 43, 200 pixelsand so requires 24 × 43, 200 = 1, 036, 800 bits to transmit directly. By comparison,the compressed images require 43, 248 bits (K = 2), 86, 472 bits (K = 3), and173, 040 bits (K = 10), respectively, to transmit. These represent compression ratioscompared to the original image of 4.2%, 8.3%, and 16.7%, respectively. We see thatthere is a trade-off between degree of compression and image quality. Note that ouraim in this example is to illustrate the K-means algorithm.

If we had been aiming toproduce a good image compressor, then it would be more fruitful to consider smallblocks of adjacent pixels, for instance 5 × 5, and thereby exploit the correlations thatexist in natural images between nearby pixels.9.2. Mixtures of GaussiansIn Section 2.3.9 we motivated the Gaussian mixture model as a simple linear superposition of Gaussian components, aimed at providing a richer class of density models than the single Gaussian. We now turn to a formulation of Gaussian mixtures interms of discrete latent variables.

This will provide us with a deeper insight into thisimportant distribution, and will also serve to motivate the expectation-maximizationalgorithm.Recall from (2.188) that the Gaussian mixture distribution can be written as alinear superposition of Gaussians in the formp(x) =Kπk N (x|µk , Σk ).(9.7)k=1Let us introduce a K-dimensional binary random variable z having a 1-of-K representation in which a particular element zk is equal to 1 and allother elements areequal to 0.

Характеристики

Тип файла

PDF-файл

Размер

9,37 Mb

Материал

Bishop C.M. Pattern Recognition and Machine Learning (2006).pdf

Тип материала

Книга

Предмет

(ММО) Методы машинного обучения

Высшее учебное заведение

МГУ им. Ломоносова

Список файлов книги

bishop-c.m.-pattern-recognition-and-machine-learning-2006.pdf.rar

Bishop C.M. Pattern Recognition and Machine Learning (2006).pdf

Поделитесь ссылкой:

Ставлю 10/10
Все нравится, очень удобный сайт, помогает в учебе. Кроме этого, можно заработать самому, выставляя готовые учебные материалы на продажу здесь. Рейтинги и отзывы на преподавателей очень помогают сориентироваться в начале нового семестра. Спасибо за такую функцию. Ставлю максимальную оценку.

Лучшая платформа для успешной сдачи сессии
Познакомился со СтудИзбой благодаря своему другу, очень нравится интерфейс, количество доступных файлов, цена, в общем, все прекрасно. Даже сам продаю какие-то свои работы.

Студизба ван лав ❤
Очень офигенный сайт для студентов. Много полезных учебных материалов. Пользуюсь студизбой с октября 2021 года. Серьёзных нареканий нет. Хотелось бы, что бы ввели подписочную модель и сделали материалы дешевле 300 рублей в рамках подписки бесплатными.

Отличный сайт
Лично меня всё устраивает - и покупка, и продажа; и цены, и возможность предпросмотра куска файла, и обилие бесплатных файлов (в подборках по авторам, читай, ВУЗам и факультетам). Есть определённые баги, но всё решаемо, да и администраторы реагируют в течение суток.

Маленький отзыв о большом помощнике!
Студизба спасает в те моменты, когда сроки горят, а работ накопилось достаточно. Довольно удобный сайт с простой навигацией и огромным количеством материалов.

Студ. Изба как крупнейший сборник работ для студентов
Тут дофига бывает всего полезного. Печально, что бывают предметы по которым даже одного бесплатного решения нет, но это скорее вопрос к студентам. В остальном всё здорово.

Спасательный островок
Если уже не успеваешь разобраться или застрял на каком-то задание поможет тебе быстро и недорого решить твою проблему.

Всё и так отлично
Всё очень удобно. Особенно круто, что есть система бонусов и можно выводить остатки денег. Очень много качественных бесплатных файлов.

Отзыв о системе "Студизба"
Отличная платформа для распространения работ, востребованных студентами. Хорошо налаженная и качественная работа сайта, огромная база заданий и аудитория.

Отличный помощник
Отличный сайт с кучей полезных файлов, позволяющий найти много методичек / учебников / отзывов о вузах и преподователях.

Отлично помогает студентам в любой момент для решения трудных и незамедлительных задач
Хотелось бы больше конкретной информации о преподавателях. А так в принципе хороший сайт, всегда им пользуюсь и ни разу не было желания прекратить. Хороший сайт для помощи студентам, удобный и приятный интерфейс. Из недостатков можно выделить только отсутствия небольшого количества файлов.

Спасибо за шикарный сайт
Великолепный сайт на котором студент за не большие деньги может найти помощь с дз, проектами курсовыми, лабораторными, а также узнать отзывы на преподавателей и бесплатно скачать пособия.