Building machine learning systems with Python (779436), страница 5

Файл №779436 Building machine learning systems with Python (Building machine learning systems with Python) 5 страницаBuilding machine learning systems with Python (779436) страница 52017-12-262017-12-26СтудИзба

Building machine learning systems with Python

Просмтор этого файла доступен только зарегистрированным пользователям. Но у нас супер быстрая регистрация: достаточно только электронной почты!

Регистрация/авторизация

Текст из файла (страница 5)

Whatever numerical heavy algorithm you takefrom current books on numerical recipes, most likely you will find support for themin SciPy in one way or the other. Whether it is matrix manipulation, linear algebra,optimization, clustering, spatial operations, or even fast Fourier transformation, thetoolbox is readily filled. Therefore, it is a good habit to always inspect the scipymodule before you start implementing a numerical algorithm.For convenience, the complete namespace of NumPy is also accessible via SciPy.

So,from now on, we will use NumPy's machinery via the SciPy namespace. You cancheck this easily comparing the function references of any base function, such as:>>> import scipy, numpy>>> scipy.version.full_version0.14.0>>> scipy.dot is numpy.dotTrueThe diverse algorithms are grouped into the following toolboxes:SciPy packagesclusterFunctionalities• Hierarchical clustering (cluster.hierarchy)• Vector quantization / k-means (cluster.vq)[ 12 ]SciPy packagesconstantsFunctionalitiesfftpackDiscrete Fourier transform algorithmsintegrateIntegration routinesinterpolateInterpolation (linear, cubic, and so on)ioData input and outputlinalgLinear algebra routines using the optimized BLAS and LAPACKlibrariesndimagen-dimensional image packageodrOrthogonal distance regressionoptimizeOptimization (finding minima and roots)signalSignal processingsparseSparse matricesspatialSpatial data structures and algorithmsspecialSpecial mathematical functions such as Bessel or JacobianstatsStatistics toolkit• Physical and mathematical constants• Conversion methodsThe toolboxes most interesting to our endeavor are scipy.stats, scipy.interpolate, scipy.cluster, and scipy.signal.

For the sake of brevity,we will briefly explore some features of the stats package and leave the othersto be explained when they show up in the individual chapters.Our first (tiny) application of machinelearningLet's get our hands dirty and take a look at our hypothetical web start-up, MLaaS,which sells the service of providing machine learning algorithms via HTTP. Withincreasing success of our company, the demand for better infrastructure increasesto serve all incoming web requests successfully. We don't want to allocate toomany resources as that would be too costly.

On the other side, we will lose money,if we have not reserved enough resources to serve all incoming requests. Now,the question is, when will we hit the limit of our current infrastructure, which weestimated to be at 100,000 requests per hour. We would like to know in advancewhen we have to request additional servers in the cloud to serve all the incomingrequests successfully without paying for unused ones.[ 13 ]Getting Started with Python Machine LearningReading in the dataWe have collected the web stats for the last month and aggregated them in ch01/data/web_traffic.tsv (.tsv because it contains tab-separated values). They arestored as the number of hits per hour.

Each line contains the hour consecutively andthe number of web hits in that hour.The first few lines look like the following:Using SciPy's genfromtxt(), we can easily read in the data using the following code:>>> import scipy as sp>>> data = sp.genfromtxt("web_traffic.tsv", delimiter="\t")We have to specify tab as the delimiter so that the columns are correctly determined.[ 14 ]A quick check shows that we have correctly read in the data:>>> print(data[:10])[[1.00000000e+002.27200000e+03][2.00000000e+00nan][3.00000000e+001.38600000e+03][4.00000000e+001.36500000e+03][5.00000000e+001.48800000e+03][6.00000000e+001.33700000e+03][7.00000000e+001.88300000e+03][8.00000000e+002.28300000e+03][9.00000000e+001.33500000e+03][1.00000000e+011.02500000e+03]]>>> print(data.shape)(743, 2)As you can see, we have 743 data points with two dimensions.Preprocessing and cleaning the dataIt is more convenient for SciPy to separate the dimensions into two vectors, eachof size 743.

The first vector, x, will contain the hours, and the other, y, will containthe Web hits in that particular hour. This splitting is done using the special indexnotation of SciPy, by which we can choose the columns individually:x = data[:,0]y = data[:,1]There are many more ways in which data can be selected from a SciPy array.Check out http://www.scipy.org/Tentative_NumPy_Tutorial for moredetails on indexing, slicing, and iterating.One caveat is still that we have some values in y that contain invalid values, nan. Thequestion is what we can do with them. Let's check how many hours contain invaliddata, by running the following code:>>> sp.sum(sp.isnan(y))8[ 15 ]Getting Started with Python Machine LearningAs you can see, we are missing only 8 out of 743 entries, so we can afford to removethem. Remember that we can index a SciPy array with another array. Sp.isnan(y)returns an array of Booleans indicating whether an entry is a number or not.

Using~, we logically negate that array so that we choose only those elements from x and ywhere y contains valid numbers:>>> x = x[~sp.isnan(y)]>>> y = y[~sp.isnan(y)]To get the first impression of our data, let's plot the data in a scatter plot usingmatplotlib. matplotlib contains the pyplot package, which tries to mimic MATLAB'sinterface, which is a very convenient and easy to use one as you can see in thefollowing code:>>> import matplotlib.pyplot as plt>>> # plot the (x,y) points with dots of size 10>>> plt.scatter(x, y, s=10)>>> plt.title("Web traffic over the last month")>>> plt.xlabel("Time")>>> plt.ylabel("Hits/hour")>>> plt.xticks([w*7*24 for w in range(10)],['week %i' % w for w in range(10)])>>> plt.autoscale(tight=True)>>> # draw a slightly opaque, dashed grid>>> plt.grid(True, linestyle='-', color='0.75')>>> plt.show()You can find more tutorials on plotting at http://matplotlib.org/users/pyplot_tutorial.html.[ 16 ]In the resulting chart, we can see that while in the first weeks the traffic stayed moreor less the same, the last week shows a steep increase:Choosing the right model and learningalgorithmNow that we have a first impression of the data, we return to the initial question:How long will our server handle the incoming web traffic? To answer this we haveto do the following:1.

Find the real model behind the noisy data points.2. Following this, use the model to extrapolate into the future to find the pointin time where our infrastructure has to be extended.[ 17 ]Getting Started with Python Machine LearningBefore building our first model…When we talk about models, you can think of them as simplified theoreticalapproximations of complex reality. As such there is always some inferiorityinvolved, also called the approximation error. This error will guide us in choosingthe right model among the myriad of choices we have.

And this error will becalculated as the squared distance of the model's prediction to the real data; forexample, for a learned model function f, the error is calculated as follows:def error(f, x, y):return sp.sum((f(x)-y)**2)The vectors x and y contain the web stats data that we have extracted earlier. It isthe beauty of SciPy's vectorized functions that we exploit here with f(x).

The trainedmodel is assumed to take a vector and return the results again as a vector of the samesize so that we can use it to calculate the difference to y.Starting with a simple straight lineLet's assume for a second that the underlying model is a straight line. Then thechallenge is how to best put that line into the chart so that it results in the smallestapproximation error. SciPy's polyfit() function does exactly that. Given data x andy and the desired order of the polynomial (a straight line has order 1), it finds themodel function that minimizes the error function defined earlier:fp1, residuals, rank, sv, rcond = sp.polyfit(x, y, 1, full=True)The polyfit() function returns the parameters of the fitted model function, fp1.And by setting full=True, we also get additional background information on thefitting process.

Of this, only residuals are of interest, which is exactly the error ofthe approximation:>>> print("Model parameters: %s" % fp1)Model parameters: [2.59619213989.02487106]>>> print(residuals)[3.17389767e+08]This means the best straight line fit is the following functionf(x) = 2.59619213 * x + 989.02487106.We then use poly1d() to create a model function from the model parameters:>>> f1 = sp.poly1d(fp1)>>> print(error(f1, x, y))317389767.34[ 18 ]We have used full=True to retrieve more details on the fitting process. Normally,we would not need it, in which case only the model parameters would be returned.We can now use f1() to plot our first trained model.

Характеристики

Тип файла

PDF-файл

Размер

6,49 Mb

Материал

Building machine learning systems with Python

Тип материала

Книга

Предмет

Системы автоматического управления (САУ) (МТ-11)

Высшее учебное заведение

МГТУ им. Н.Э.Баумана

Список файлов книги

building-machine-learning-systems-with-python-1474685854-1514288745.rar

Building machine learning systems with Python.pdf

Поделитесь ссылкой:

Ставлю 10/10
Все нравится, очень удобный сайт, помогает в учебе. Кроме этого, можно заработать самому, выставляя готовые учебные материалы на продажу здесь. Рейтинги и отзывы на преподавателей очень помогают сориентироваться в начале нового семестра. Спасибо за такую функцию. Ставлю максимальную оценку.

Лучшая платформа для успешной сдачи сессии
Познакомился со СтудИзбой благодаря своему другу, очень нравится интерфейс, количество доступных файлов, цена, в общем, все прекрасно. Даже сам продаю какие-то свои работы.

Студизба ван лав ❤
Очень офигенный сайт для студентов. Много полезных учебных материалов. Пользуюсь студизбой с октября 2021 года. Серьёзных нареканий нет. Хотелось бы, что бы ввели подписочную модель и сделали материалы дешевле 300 рублей в рамках подписки бесплатными.

Отличный сайт
Лично меня всё устраивает - и покупка, и продажа; и цены, и возможность предпросмотра куска файла, и обилие бесплатных файлов (в подборках по авторам, читай, ВУЗам и факультетам). Есть определённые баги, но всё решаемо, да и администраторы реагируют в течение суток.

Маленький отзыв о большом помощнике!
Студизба спасает в те моменты, когда сроки горят, а работ накопилось достаточно. Довольно удобный сайт с простой навигацией и огромным количеством материалов.

Студ. Изба как крупнейший сборник работ для студентов
Тут дофига бывает всего полезного. Печально, что бывают предметы по которым даже одного бесплатного решения нет, но это скорее вопрос к студентам. В остальном всё здорово.

Спасательный островок
Если уже не успеваешь разобраться или застрял на каком-то задание поможет тебе быстро и недорого решить твою проблему.

Всё и так отлично
Всё очень удобно. Особенно круто, что есть система бонусов и можно выводить остатки денег. Очень много качественных бесплатных файлов.

Отзыв о системе "Студизба"
Отличная платформа для распространения работ, востребованных студентами. Хорошо налаженная и качественная работа сайта, огромная база заданий и аудитория.

Отличный помощник
Отличный сайт с кучей полезных файлов, позволяющий найти много методичек / учебников / отзывов о вузах и преподователях.

Отлично помогает студентам в любой момент для решения трудных и незамедлительных задач
Хотелось бы больше конкретной информации о преподавателях. А так в принципе хороший сайт, всегда им пользуюсь и ни разу не было желания прекратить. Хороший сайт для помощи студентам, удобный и приятный интерфейс. Из недостатков можно выделить только отсутствия небольшого количества файлов.

Спасибо за шикарный сайт
Великолепный сайт на котором студент за не большие деньги может найти помощь с дз, проектами курсовыми, лабораторными, а также узнать отзывы на преподавателей и бесплатно скачать пособия.