Building machine learning systems with Python (779436), страница 38

Файл №779436 Building machine learning systems with Python (Building machine learning systems with Python) 38 страницаBuilding machine learning systems with Python (779436) страница 382017-12-262017-12-26СтудИзба

Building machine learning systems with Python

Просмтор этого файла доступен только зарегистрированным пользователям. Но у нас супер быстрая регистрация: достаточно только электронной почты!

Регистрация/авторизация

Текст из файла (страница 38)

Since each of these components is an 8-bit value, the total is17 million different colors. We are going to reduce this number to only 64 colorsby grouping colors into bins. We will write a function to encapsulate this algorithmas follows:def chist(im):To bin the colors, we first divide the image by 64, rounding down the pixel valuesas follows:im = im // 64This makes the pixel values range from 0 to 3, which gives a total of 64 different colors.Separate the red, green, and blue channels as follows:r,g,b = im.transpose((2,0,1))pixels = 1 * r + 4 * b + 16 * ghist = np.bincount(pixels.ravel(), minlength=64)hist = hist.astype(float)Convert to log scale, as seen in the following code snippet.

This is not strictlynecessary, but makes for better features. We use np.log1p, which computes log(h+1).This ensures that zero values are kept as zero values (mathematically, the logarithmof zero is not defined, and NumPy prints a warning if you attempt to compute it).hist = np.log1p(hist)return histWe can adapt the previous processing code to use the function we wrote very easily:>>> features = []>>> for im in images:...image = mh.imread(im)...features.append(chist(im))Using the same cross-validation code we used earlier, we obtain 90 percent accuracy.The best results, however, come from combining all the features, which we canimplement as follows:>>> features = []>>> for im in images:...imcolor = mh.imread(im)...im = mh.colors.rgb2gray(imcolor, dtype=np.uint8)[ 231 ]Computer Vision...features.append(np.concatenate([...mh.features.haralick(im).ravel(),...chist(imcolor),...]))By using all of these features, we get 95.6 percent accuracy, as shown in the followingcode snippet:>>> scores = cross_validation.cross_val_score(...clf, features, labels, cv=cv)>>> print('Accuracy: {:.1%}'.format(scores.mean()))Accuracy: 95.6%This is a perfect illustration of the principle that good algorithms are the easypart.

You can always use an implementation of state-of-the-art classification fromscikit-learn. The real secret and added value often comes in feature design andengineering. This is where knowledge of your dataset is valuable.Using features to find similar imagesThe basic concept of representing an image by a relatively small number of featurescan be used for more than just classification.

For example, we can also use it to findsimilar images to a given query image (as we did before with text documents).We will compute the same features as before, with one important difference: wewill ignore the bordering area of the picture. The reason is that due to the amateurnature of the compositions, the edges of the picture often contain irrelevant elements.When the features are computed over the whole image, these elements are taken intoaccount. By simply ignoring them, we get slightly better features.

In the supervisedexample, it is not as important, as the learning algorithm will then learn whichfeatures are more informative and weigh them accordingly. When working in anunsupervised fashion, we need to be more careful to ensure that our features arecapturing important elements of the data. This is implemented in the loop as follows:>>> features = []>>> for im in images:...imcolor = mh.imread(im)...# ignore everything in the 200 pixels closest to the borders...imcolor = imcolor[200:-200, 200:-200]...im = mh.colors.rgb2gray(imcolor, dtype=np.uint8)...features.append(np.concatenate([[ 232 ]Chapter 10...mh.features.haralick(im).ravel(),...chist(imcolor),...]))We now normalize the features and compute the distance matrix as follows:>>> sc = StandardScaler()>>> features = sc.fit_transform(features)>>> from scipy.spatial import distance>>> dists = distance.squareform(distance.pdist(features))We will plot just a subset of the data (every 10th element) so that the query will be ontop and the returned "nearest neighbor" at the bottom, as shown in the following:>>> fig, axes = plt.subplots(2, 9)>>> for ci,i in enumerate(range(0,90,10)):...left = images[i]...dists_left = dists[i]...right = dists_left.argsort()...# right[0] is same as left[i], so pick next closest...right = right[1]...right = images[right]...left = mh.imread(left)...right = mh.imread(right)...axes[0, ci].imshow(left)...axes[1, ci].imshow(right)The result is shown in the following screenshot:[ 233 ]Computer VisionIt is clear that the system is not perfect, but can find images that are at least visuallysimilar to the queries.

In all but one case, the image found comes from the same classas the query.Classifying a harder datasetThe previous dataset was an easy dataset for classification using texture features.In fact, many of the problems that are interesting from a business point of view arerelatively easy. However, sometimes we may be faced with a tougher problem andneed better and more modern techniques to get good results.We will now test a public dataset, which has the same structure: several photographssplit into a small number of classes.

The classes are animals, cars, transportation, andnatural scenes.When compared to the three class problem we discussed previously, these classes areharder to tell apart. Natural scenes, buildings, and texts have very different textures.In this dataset, however, texture and color are not as clear marker, of the image class.The following is one example from the animal class:And here is another example from the car class:[ 234 ]Chapter 10Both objects are against natural backgrounds, and with large smooth areas inside theobjects.

This is a harder problem than the simple dataset, so we will need to use moreadvanced methods. The first improvement will be to use a slightly more powerfulclassifier. The logistic regression that scikit-learn provides is a penalized form oflogistic regression, which contains an adjustable parameter, C. By default, C = 1.0,but this may not be optimal. We can use grid search to find a good value for thisparameter as follows:>>> from sklearn.grid_search import GridSearchCV>>> C_range = 10.0 ** np.arange(-4, 3)>>> grid = GridSearchCV(LogisticRegression(), param_grid={'C' : C_range})>>> clf = Pipeline([('preproc', StandardScaler()),...('classifier', grid)])The data is not organized in a random order inside the dataset: similar images are closetogether.

Thus, we use a cross-validation schedule that considers the data shuffled sothat each fold has a more representative training set, as shown in the following:>>> cv = cross_validation.KFold(len(features), 5,...shuffle=True, random_state=123)>>> scores = cross_validation.cross_val_score(...clf, features, labels, cv=cv)>>> print('Accuracy: {:.1%}'.format(scores.mean()))Accuracy: 72.1%This is not so bad for four classes, but we will now see if we can do better by usinga different set of features.

In fact, we will see that we need to combine these featureswith other methods to get the best possible results.Local feature representationsA relatively recent development in the computer vision world has been thedevelopment of local-feature based methods. Local features are computed on a smallregion of the image, unlike the previous features we considered, which had beencomputed on the whole image. Mahotas supports computing a type of these features,Speeded Up Robust Features (SURF). There are several others, the most well-knownbeing the original proposal of SIFT.

These features are designed to be robust againstrotational or illumination changes (that is, they only change their value slightly whenillumination changes).[ 235 ]Computer VisionWhen using these features, we have to decide where to compute them. There arethree possibilities that are commonly used:• Randomly• In a grid• Detecting interesting areas of the image (a technique known as keypointdetection or interest point detection)All of these are valid and will, under the right circumstances, give good results.Mahotas supports all three. Using interest point detection works best if you have areason to expect that your interest point will correspond to areas of importance inthe image.We will be using the interest point method.

Computing the features with mahotas iseasy: import the right submodule and call the surf.surf function as follows:>>> from mahotas.features import surf>>> image = mh.demos.load('lena')>>> image = mh.colors.rgb2gray(im, dtype=np.uint8)>>> descriptors = surf.surf(image, descriptor_only=True)The descriptors_only=True flag means that we are only interested in thedescriptors themselves, and not in their pixel location, size, or orientation.Alternatively, we could have used the dense sampling method, using the surf.dense function as follows:>>> from mahotas.features import surf>>> descriptors = surf.dense(image, spacing=16)This returns the value of the descriptors computed on points that are at a distance of16 pixels from each other.

Since the position of the points is fixed, the metainformationon the interest points is not very interesting and is not returned by default. In eithercase, the result (descriptors) is an n-times-64 array, where n is the number of pointssampled. The number of points depends on the size of your images, their content, andthe parameters you pass to the functions. In this example, we are using the defaultsettings, and we obtain a few hundred descriptors per image.We cannot directly feed these descriptors to a support vector machine, logisticregressor, or similar classification system. In order to use the descriptors from theimages, there are several solutions.

We could just average them, but the results ofdoing so are not very good as they throw away all location specific information. Inthat case, we would have just another global feature set based on edge measurements.[ 236 ]Chapter 10The solution we will use here is the bag of words model, which is a very recent idea.It was published in this form first in 2004. This is one of those obvious-in-hindsightideas: it is very simple to implement and achieves very good results.It may seem strange to speak of words when dealing with images. It may be easierto understand if you think that you have not written words, which are easy todistinguish from each other, but orally spoken audio. Now, each time a word isspoken, it will sound slightly different, and different speakers will have their ownpronunciation.

Характеристики

Тип файла

PDF-файл

Размер

6,49 Mb

Материал

Building machine learning systems with Python

Тип материала

Книга

Предмет

Системы автоматического управления (САУ) (МТ-11)

Высшее учебное заведение

МГТУ им. Н.Э.Баумана

Список файлов книги

building-machine-learning-systems-with-python-1474685854-1514288745.rar

Building machine learning systems with Python.pdf

Поделитесь ссылкой:

Ставлю 10/10
Все нравится, очень удобный сайт, помогает в учебе. Кроме этого, можно заработать самому, выставляя готовые учебные материалы на продажу здесь. Рейтинги и отзывы на преподавателей очень помогают сориентироваться в начале нового семестра. Спасибо за такую функцию. Ставлю максимальную оценку.

Лучшая платформа для успешной сдачи сессии
Познакомился со СтудИзбой благодаря своему другу, очень нравится интерфейс, количество доступных файлов, цена, в общем, все прекрасно. Даже сам продаю какие-то свои работы.

Студизба ван лав ❤
Очень офигенный сайт для студентов. Много полезных учебных материалов. Пользуюсь студизбой с октября 2021 года. Серьёзных нареканий нет. Хотелось бы, что бы ввели подписочную модель и сделали материалы дешевле 300 рублей в рамках подписки бесплатными.

Отличный сайт
Лично меня всё устраивает - и покупка, и продажа; и цены, и возможность предпросмотра куска файла, и обилие бесплатных файлов (в подборках по авторам, читай, ВУЗам и факультетам). Есть определённые баги, но всё решаемо, да и администраторы реагируют в течение суток.

Маленький отзыв о большом помощнике!
Студизба спасает в те моменты, когда сроки горят, а работ накопилось достаточно. Довольно удобный сайт с простой навигацией и огромным количеством материалов.

Студ. Изба как крупнейший сборник работ для студентов
Тут дофига бывает всего полезного. Печально, что бывают предметы по которым даже одного бесплатного решения нет, но это скорее вопрос к студентам. В остальном всё здорово.

Спасательный островок
Если уже не успеваешь разобраться или застрял на каком-то задание поможет тебе быстро и недорого решить твою проблему.

Всё и так отлично
Всё очень удобно. Особенно круто, что есть система бонусов и можно выводить остатки денег. Очень много качественных бесплатных файлов.

Отзыв о системе "Студизба"
Отличная платформа для распространения работ, востребованных студентами. Хорошо налаженная и качественная работа сайта, огромная база заданий и аудитория.

Отличный помощник
Отличный сайт с кучей полезных файлов, позволяющий найти много методичек / учебников / отзывов о вузах и преподователях.

Отлично помогает студентам в любой момент для решения трудных и незамедлительных задач
Хотелось бы больше конкретной информации о преподавателях. А так в принципе хороший сайт, всегда им пользуюсь и ни разу не было желания прекратить. Хороший сайт для помощи студентам, удобный и приятный интерфейс. Из недостатков можно выделить только отсутствия небольшого количества файлов.

Спасибо за шикарный сайт
Великолепный сайт на котором студент за не большие деньги может найти помощь с дз, проектами курсовыми, лабораторными, а также узнать отзывы на преподавателей и бесплатно скачать пособия.