Building machine learning systems with Python (779436), страница 10

Файл №779436 Building machine learning systems with Python (Building machine learning systems with Python) 10 страницаBuilding machine learning systems with Python (779436) страница 102017-12-262017-12-26СтудИзба

Building machine learning systems with Python

Просмтор этого файла доступен только зарегистрированным пользователям. Но у нас супер быстрая регистрация: достаточно только электронной почты!

Регистрация/авторизация

Текст из файла (страница 10)

Take a look at the following plot:Canadian examples are shown as diamonds, Koma seeds as circles, and Rosa seedsas triangles. Their respective areas are shown as white, black, and grey. You mightbe wondering why the regions are so horizontal, almost weirdly so. The problem isthat the x axis (area) ranges from 10 to 22, while the y axis (compactness) ranges from0.75 to 1.0.

This means that a small change in x is actually much larger than a smallchange in y. So, when we compute the distance between points, we are, for the mostpart, only taking the x axis into account. This is also a good example of why it is agood idea to visualize our data and look for red flags or surprises.[ 45 ]Classifying with Real-world ExamplesIf you studied physics (and you remember your lessons), you might have alreadynoticed that we had been summing up lengths, areas, and dimensionless quantities,mixing up our units (which is something you never want to do in a physical system).We need to normalize all of the features to a common scale. There are many solutionsto this problem; a simple one is to normalize to z-scores.

The z-score of a value ishow far away from the mean it is, in units of standard deviation. It comes downto this operation:f′=f −µσIn this formula, f is the old feature value, f' is the normalized feature value, µ is themean of the feature, and σ is the standard deviation. Both µ and σ are estimated fromtraining data.

Independent of what the original values were, after z-scoring, a valueof zero corresponds to the training mean, positive values are above the mean, andnegative values are below it.The scikit-learn module makes it very easy to use this normalization as apreprocessing step. We are going to use a pipeline of transformations: the firstelement will do the transformation and the second element will do the classification.We start by importing both the pipeline and the feature scaling classes as follows:>>> from sklearn.pipeline import Pipeline>>> from sklearn.preprocessing import StandardScalerNow, we can combine them.>>> classifier = KNeighborsClassifier(n_neighbors=1)>>> classifier = Pipeline([('norm', StandardScaler()),...('knn', classifier)])The Pipeline constructor takes a list of pairs (str,clf). Each pair corresponds to astep in the pipeline: the first element is a string naming the step, while the secondelement is the object that performs the transformation.

Advanced usage of the objectuses these names to refer to different steps.After normalization, every feature is in the same units (technically, every feature isnow dimensionless; it has no units) and we can more confidently mix dimensions.In fact, if we now run our nearest neighbor classifier, we obtain 93 percent accuracy,estimated with the same five-fold cross-validation code shown previously![ 46 ]Look at the decision space again in two dimensions:The boundaries are now different and you can see that both dimensionsmake a difference for the outcome. In the full dataset, everything is happeningon a seven-dimensional space, which is very hard to visualize, but the sameprinciple applies; while a few dimensions are dominant in the original data,after normalization, they are all given the same importance.Binary and multiclass classificationThe first classifier we used, the threshold classifier, was a simple binary classifier.

Itsresult is either one class or the other, as a point is either above the threshold value orit is not. The second classifier we used, the nearest neighbor classifier, was a naturalmulticlass classifier, its output can be one of the several classes.[ 47 ]Classifying with Real-world ExamplesIt is often simpler to define a simple binary method than the one that works onmulticlass problems. However, we can reduce any multiclass problem to a series ofbinary decisions. This is what we did earlier in the Iris dataset, in a haphazard way:we observed that it was easy to separate one of the initial classes and focused on theother two, reducing the problem to two binary decisions:1. Is it an Iris Setosa (yes or no)?2.

If not, check whether it is an Iris Virginica (yes or no).Of course, we want to leave this sort of reasoning to the computer. As usual, thereare several solutions to this multiclass reduction.The simplest is to use a series of one versus the rest classifiers. For each possiblelabel ℓ, we build a classifier of the type is this ℓ or something else? When applyingthe rule, exactly one of the classifiers will say yes and we will have our solution.Unfortunately, this does not always happen, so we have to decide how to dealwith either multiple positive answers or no positive answers.Alternatively, we can build a classification tree.

Split the possible labels into two,and build a classifier that asks, "Should this example go in the left or the rightbin?" We can perform this splitting recursively until we obtain a single label. Thepreceding diagram depicts the tree of reasoning for the Iris dataset. Each diamond isa single binary classifier. It is easy to imagine that we could make this tree larger andencompass more decisions. This means that any classifier that can be used for binaryclassification can also be adapted to handle any number of classes in a simple way.[ 48 ]There are many other possible ways of turning a binary method into a multiclass one.There is no single method that is clearly better in all cases.

The scikit-learn moduleimplements several of these methods in the sklearn.multiclass submodule.Some classifiers are binary systems, while many real-life problems arenaturally multiclass. Several simple protocols reduce a multiclass problemto a series of binary decisions and allow us to apply the binary models toour multiclass problem. This means methods that are apparently only forbinary data can be applied to multiclass data with little extra effort.SummaryClassification means generalizing from examples to build a model (that is, a rulethat can automatically be applied to new, unclassified objects). It is one of thefundamental tools in machine learning and we will see many more examplesof this in the forthcoming chapters.In a sense, this was a very theoretical chapter, as we introduced generic conceptswith simple examples. We went over a few operations with the Iris dataset.

This is asmall dataset. However, it has the advantage that we were able to plot it out and seewhat we were doing in detail. This is something that will be lost when we move onto problems with many dimensions and many thousands of examples. The intuitionswe gained here will all still be valid.You also learned that the training error is a misleading, over-optimistic estimate of howwell the model does.

We must, instead, evaluate it on testing data that has not beenused for training. In order to not waste too many examples in testing, a cross-validationschedule can get us the best of both worlds (at the cost of more computation).We also had a look at the problem of feature engineering.

Features are notpredefined for you, but choosing and designing features is an integral part ofdesigning a machine learning pipeline. In fact, it is often the area where you canget the most improvements in accuracy, as better data beats fancier methods. Thechapters on text-based classification, music genre recognition, and computer visionwill provide examples for these specific settings.The next chapter looks at how to proceed when your data does not have predefinedclasses for classification.[ 49 ]Clustering – FindingRelated PostsIn the previous chapter, you learned how to find the classes or categories ofindividual datapoints. With a handful of training data items that were paired withtheir respective classes, you learned a model, which we can now use to classifyfuture data items.

We called this supervised learning because the learning wasguided by a teacher; in our case, the teacher had the form of correct classifications.Let's now imagine that we do not possess those labels by which we can learn theclassification model. This could be, for example, because they were too expensive tocollect. Just imagine the cost if the only way to obtain millions of labels will be to askhumans to classify those manually. What could we have done in that case?Well, of course, we will not be able to learn a classification model. Still, we could findsome pattern within the data itself.

Характеристики

Тип файла

PDF-файл

Размер

6,49 Mb

Материал

Building machine learning systems with Python

Тип материала

Книга

Предмет

Системы автоматического управления (САУ) (МТ-11)

Высшее учебное заведение

МГТУ им. Н.Э.Баумана

Список файлов книги

building-machine-learning-systems-with-python-1474685854-1514288745.rar

Building machine learning systems with Python.pdf

Поделитесь ссылкой:

Ставлю 10/10
Все нравится, очень удобный сайт, помогает в учебе. Кроме этого, можно заработать самому, выставляя готовые учебные материалы на продажу здесь. Рейтинги и отзывы на преподавателей очень помогают сориентироваться в начале нового семестра. Спасибо за такую функцию. Ставлю максимальную оценку.

Лучшая платформа для успешной сдачи сессии
Познакомился со СтудИзбой благодаря своему другу, очень нравится интерфейс, количество доступных файлов, цена, в общем, все прекрасно. Даже сам продаю какие-то свои работы.

Студизба ван лав ❤
Очень офигенный сайт для студентов. Много полезных учебных материалов. Пользуюсь студизбой с октября 2021 года. Серьёзных нареканий нет. Хотелось бы, что бы ввели подписочную модель и сделали материалы дешевле 300 рублей в рамках подписки бесплатными.

Отличный сайт
Лично меня всё устраивает - и покупка, и продажа; и цены, и возможность предпросмотра куска файла, и обилие бесплатных файлов (в подборках по авторам, читай, ВУЗам и факультетам). Есть определённые баги, но всё решаемо, да и администраторы реагируют в течение суток.

Маленький отзыв о большом помощнике!
Студизба спасает в те моменты, когда сроки горят, а работ накопилось достаточно. Довольно удобный сайт с простой навигацией и огромным количеством материалов.

Студ. Изба как крупнейший сборник работ для студентов
Тут дофига бывает всего полезного. Печально, что бывают предметы по которым даже одного бесплатного решения нет, но это скорее вопрос к студентам. В остальном всё здорово.

Спасательный островок
Если уже не успеваешь разобраться или застрял на каком-то задание поможет тебе быстро и недорого решить твою проблему.

Всё и так отлично
Всё очень удобно. Особенно круто, что есть система бонусов и можно выводить остатки денег. Очень много качественных бесплатных файлов.

Отзыв о системе "Студизба"
Отличная платформа для распространения работ, востребованных студентами. Хорошо налаженная и качественная работа сайта, огромная база заданий и аудитория.

Отличный помощник
Отличный сайт с кучей полезных файлов, позволяющий найти много методичек / учебников / отзывов о вузах и преподователях.

Отлично помогает студентам в любой момент для решения трудных и незамедлительных задач
Хотелось бы больше конкретной информации о преподавателях. А так в принципе хороший сайт, всегда им пользуюсь и ни разу не было желания прекратить. Хороший сайт для помощи студентам, удобный и приятный интерфейс. Из недостатков можно выделить только отсутствия небольшого количества файлов.

Спасибо за шикарный сайт
Великолепный сайт на котором студент за не большие деньги может найти помощь с дз, проектами курсовыми, лабораторными, а также узнать отзывы на преподавателей и бесплатно скачать пособия.