Building machine learning systems with Python

PDF-файл Building machine learning systems with Python Системы автоматического управления (САУ) (МТ-11) (15196): Книга - 8 семестрBuilding machine learning systems with Python: Системы автоматического управления (САУ) (МТ-11) - PDF (15196) - СтудИзба2017-12-26СтудИзба

Описание файла

PDF-файл из архива "Building machine learning systems with Python", который расположен в категории "". Всё это находится в предмете "системы автоматического управления (сау) (мт-11)" из 8 семестр, которые можно найти в файловом архиве МГТУ им. Н.Э.Баумана. Не смотря на прямую связь этого архива с МГТУ им. Н.Э.Баумана, его также можно найти и в других разделах. Архив можно найти в разделе "книги и методические указания", в предмете "системы автоматического управления (сау)" в общих файлах.

Просмотр PDF-файла онлайн

Текст из PDF

[1]Building Machine LearningSystems with PythonSecond EditionGet more from your data through creating practicalmachine learning systems with PythonLuis Pedro CoelhoWilli RichertBIRMINGHAM - MUMBAIBuilding Machine Learning Systems with PythonSecond EditionCopyright © 2015 Packt PublishingAll rights reserved. No part of this book may be reproduced, stored in a retrievalsystem, or transmitted in any form or by any means, without the prior writtenpermission of the publisher, except in the case of brief quotations embedded incritical articles or reviews.Every effort has been made in the preparation of this book to ensure the accuracyof the information presented. However, the information contained in this book issold without warranty, either express or implied.

Neither the authors, nor PacktPublishing, and its dealers and distributors will be held liable for any damagescaused or alleged to be caused directly or indirectly by this book.Packt Publishing has endeavored to provide trademark information about all of thecompanies and products mentioned in this book by the appropriate use of capitals.However, Packt Publishing cannot guarantee the accuracy of this information.First published: July 2013Second edition: March 2015Production reference: 1230315Published by Packt Publishing Ltd.Livery Place35 Livery StreetBirmingham B3 2PB, UK.ISBN 978-1-78439-277-2www.packtpub.comCreditsAuthorsLuis Pedro CoelhoProject CoordinatorNikhil NairWilli RichertProofreadersReviewersSimran BhogalMatthieu BrucherLawrence A. HermanMaurice HT LingLinda MorrisRadim ŘehůřekPaul HindleCommissioning EditorKartikey PandeyAcquisition EditorsIndexerHemangini BariGraphicsGreg WildSheetal AuteRichard HarveyAbhinash SahuKartikey PandeyProduction CoordinatorContent Development EditorArvindkumar GuptaArun NadarCover WorkTechnical EditorPankaj KadamCopy EditorsRelin HedlySameen SiddiquiLaxmi SubramanianArvindkumar GuptaAbout the AuthorsLuis Pedro Coelho is a computational biologist: someone who uses computersas a tool to understand biological systems.

In particular, Luis analyzes DNAfrom microbial communities to characterize their behavior. Luis has also workedextensively in bioimage informatics—the application of machine learning techniquesfor the analysis of images of biological specimens. His main focus is on the processingand integration of large-scale datasets.Luis has a PhD from Carnegie Mellon University, one of the leading universitiesin the world in the area of machine learning.

He is the author of several scientificpublications.Luis started developing open source software in 1998 as a way to apply real code towhat he was learning in his computer science courses at the Technical University ofLisbon. In 2004, he started developing in Python and has contributed to several opensource libraries in this language. He is the lead developer on the popular computervision package for Python and mahotas, as well as the contributor of several machinelearning codes.Luis currently divides his time between Luxembourg and Heidelberg.I thank my wife, Rita, for all her love and support and my daughter,Anna, for being the best thing ever.Willi Richert has a PhD in machine learning/robotics, where he usedreinforcement learning, hidden Markov models, and Bayesian networks to letheterogeneous robots learn by imitation.

Currently, he works for Microsoft in theCore Relevance Team of Bing, where he is involved in a variety of ML areas suchas active learning, statistical machine translation, and growing decision trees.This book would not have been possible without the support ofmy wife, Natalie, and my sons, Linus and Moritz. I am especiallygrateful for the many fruitful discussions with my current orprevious managers, Andreas Bode, Clemens Marschner, HongyanZhou, and Eric Crestan, as well as my colleagues and friends,Tomasz Marciniak, Cristian Eigel, Oliver Niehoerster, and PhilippAdelt. The interesting ideas are most likely from them; the bugsbelong to me.About the ReviewersMatthieu Brucher holds an engineering degree from the Ecole Supérieured'Electricité (Information, Signals, Measures), France and has a PhD in unsupervisedmanifold learning from the Université de Strasbourg, France.

He currently holdsan HPC software developer position in an oil company and is working on the nextgeneration reservoir simulation.Maurice HT Ling has been programming in Python since 2003. Having completedhis PhD in Bioinformatics and BSc (Hons.) in Molecular and Cell Biology from TheUniversity of Melbourne, he is currently a Research Fellow at Nanyang TechnologicalUniversity, Singapore, and an Honorary Fellow at The University of Melbourne,Australia.

Maurice is the Chief Editor for Computational and Mathematical Biology, andco-editor for The Python Papers. Recently, Maurice cofounded the first synthetic biologystart-up in Singapore, AdvanceSyn Pte. Ltd., as the Director and Chief TechnologyOfficer.

His research interests lies in life—biological life, artificial life, and artificialintelligence—using computer science and statistics as tools to understand life andits numerous aspects. In his free time, Maurice likes to read, enjoy a cup of coffee,write his personal journal, or philosophize on various aspects of life. His website andLinkedIn profile are http://maurice.vodien.com and http://www.linkedin.com/in/mauriceling, respectively.Radim Řehůřek is a tech geek and developer at heart.

He founded and led theresearch department at Seznam.cz, a major search engine company in central Europe.After finishing his PhD, he decided to move on and spread the machine learninglove, starting his own privately owned R&D company, RaRe Consulting Ltd. RaRespecializes in made-to-measure data mining solutions, delivering cutting-edgesystems for clients ranging from large multinationals to nascent start-ups.Radim is also the author of a number of popular open source projects, includinggensim and smart_open.A big fan of experiencing different cultures, Radim has lived around the globe with hiswife for the past decade, with his next steps leading to South Korea. No matter wherehe stays, Radim and his team always try to evangelize data-driven solutions and helpcompanies worldwide make the most of their machine learning opportunities.www.PacktPub.comSupport files, eBooks, discount offers, and moreFor support files and downloads related to your book, please visit www.PacktPub.com.Did you know that Packt offers eBook versions of every book published, with PDFand ePub files available? You can upgrade to the eBook version at www.PacktPub.comand as a print book customer, you are entitled to a discount on the eBook copy.

Get intouch with us at service@packtpub.com for more details.At www.PacktPub.com, you can also read a collection of free technical articles, signup for a range of free newsletters and receive exclusive discounts and offers on Packtbooks and eBooks.TMhttps://www2.packtpub.com/books/subscription/packtlibDo you need instant solutions to your IT questions? PacktLib is Packt's online digitalbook library. Here, you can search, access, and read Packt's entire library of books.Why subscribe?• Fully searchable across every book published by Packt• Copy and paste, print, and bookmark content• On demand and accessible via a web browserFree access for Packt account holdersIf you have an account with Packt at www.PacktPub.com, you can use this to accessPacktLib today and view 9 entirely free books.

Simply use your login credentials forimmediate access.Table of ContentsPrefaceviiChapter 1: Getting Started with Python Machine Learning1Machine learning and Python – a dream teamWhat the book will teach you (and what it will not)What to do when you are stuckGetting startedIntroduction to NumPy, SciPy, and matplotlibInstalling PythonChewing data efficiently with NumPy and intelligently with SciPyLearning NumPy23456667Indexing9Handling nonexisting values10Comparing the runtime11Learning SciPyOur first (tiny) application of machine learningReading in the dataPreprocessing and cleaning the dataChoosing the right model and learning algorithmBefore building our first model…Starting with a simple straight lineTowards some advanced stuffStepping back to go forward – another look at our dataTraining and testingAnswering our initial question1213141517181820222627Summary28Chapter 2: Classifying with Real-world ExamplesThe Iris datasetVisualization is a good first stepBuilding our first classification modelEvaluation – holding out data and cross-validation[i]2930303236Table of ContentsBuilding more complex classifiers39A more complex dataset and a more complex classifier41Learning about the Seeds dataset41Features and feature engineering42Nearest neighbor classification43Classifying with scikit-learn43Looking at the decision boundaries45Binary and multiclass classification47Summary49Chapter 3: Clustering – Finding Related PostsMeasuring the relatedness of postsHow not to do itHow to do itPreprocessing – similarity measured as a similarnumber of common wordsConverting raw text into a bag of words515252535454Counting words55Normalizing word count vectors58Removing less important words59Stemming60Stop words on steroids63Our achievements and goals65Clustering66K-means66Getting test data to evaluate our ideas on70Clustering posts72Solving our initial challenge73Another look at noise75Tweaking the parameters76Summary77Chapter 4: Topic Modeling79Chapter 5: Classification – Detecting Poor Answers95Latent Dirichlet allocation80Building a topic model81Comparing documents by topics86Modeling the whole of Wikipedia89Choosing the number of topics92Summary94Sketching our roadmapLearning to classify classy answersTuning the instance[ ii ]969696Table of ContentsTuning the classifier96Fetching the data97Slimming the data down to chewable chunks98Preselection and processing of attributes98Defining what is a good answer100Creating our first classifier100Starting with kNN100Engineering the features101Training the classifier103Measuring the classifier's performance103Designing more features104Deciding how to improve107Bias-variance and their tradeoff108Fixing high bias108Fixing high variance109High bias or low bias109Using logistic regression112A bit of math with a small example112Applying logistic regression to our post classification problem114Looking behind accuracy – precision and recall116Slimming the classifier120Ship it!121Summary121Chapter 6: Classification II – Sentiment AnalysisSketching our roadmapFetching the Twitter dataIntroducing the Naïve Bayes classifierGetting to know the Bayes' theoremBeing naïveUsing Naïve Bayes to classifyAccounting for unseen words and other odditiesAccounting for arithmetic underflowsCreating our first classifier and tuning itSolving an easy problem firstUsing all classesTuning the classifier's parametersCleaning tweetsTaking the word types into accountDetermining the word typesSuccessfully cheating using SentiWordNet[ iii ]123123124124125126127131132134135138141146148148150Table of ContentsOur first estimator152Putting everything together155Summary156Chapter 7: Regression157Chapter 8: Recommendations175Chapter 9: Classification – Music Genre Classification199Predicting house prices with regressionMultidimensional regressionCross-validation for regressionPenalized or regularized regressionL1 and L2 penaltiesUsing Lasso or ElasticNet in scikit-learnVisualizing the Lasso pathP-greater-than-N scenariosAn example based on text documentsSetting hyperparameters in a principled waySummaryRating predictions and recommendationsSplitting into training and testingNormalizing the training dataA neighborhood approach to recommendationsA regression approach to recommendationsCombining multiple methodsBasket analysisObtaining useful predictionsAnalyzing supermarket shopping basketsAssociation rule miningMore advanced basket analysisSummarySketching our roadmapFetching the music dataConverting into a WAV formatLooking at musicDecomposing music into sine wave componentsUsing FFT to build our first classifierIncreasing experimentation agilityTraining the classifierUsing a confusion matrix to measure accuracy inmulticlass problems[ iv ]157161162163164165166167168170174175177178180184186188190190194196197199200200201203205205207207Table of ContentsAn alternative way to measure classifier performanceusing receiver-operator characteristics210Improving classification performance with MelFrequency Cepstral Coefficients214Summary218Chapter 10: Computer Vision219Chapter 11: Dimensionality Reduction241Introducing image processing219Loading and displaying images220Thresholding222Gaussian blurring223Putting the center in focus225Basic image classification228Computing features from images229Writing your own features230Using features to find similar images232Classifying a harder dataset234Local feature representations235Summary239Sketching our roadmapSelecting featuresDetecting redundant features using filtersCorrelationMutual informationAsking the model about the features using wrappersOther feature selection methodsFeature extractionAbout principal component analysisSketching PCAApplying PCA242242242243246251253254254255255Limitations of PCA and how LDA can help257Multidimensional scaling258Summary262Chapter 12: Bigger DataLearning about big dataUsing jug to break up your pipeline into tasksAn introduction to tasks in jugLooking under the hoodUsing jug for data analysisReusing partial results[v]263264264265268269272Table of ContentsUsing Amazon Web ServicesCreating your first virtual machinesInstalling Python packages on Amazon LinuxRunning jug on our cloud machine274276282283Automating the generation of clusters with StarCluster284Summary288Appendix: Where to Learn More Machine Learning291Online courses291Books291Question and answer sites292Blogs292Data sources293Getting competitive293All that was left out293Summary294Index295[ vi ]PrefaceOne could argue that it is a fortunate coincidence that you are holding this book inyour hands (or have it on your eBook reader).

Свежие статьи
Популярно сейчас
Почему делать на заказ в разы дороже, чем купить готовую учебную работу на СтудИзбе? Наши учебные работы продаются каждый год, тогда как большинство заказов выполняются с нуля. Найдите подходящий учебный материал на СтудИзбе!
Ответы на популярные вопросы
Да! Наши авторы собирают и выкладывают те работы, которые сдаются в Вашем учебном заведении ежегодно и уже проверены преподавателями.
Да! У нас любой человек может выложить любую учебную работу и зарабатывать на её продажах! Но каждый учебный материал публикуется только после тщательной проверки администрацией.
Вернём деньги! А если быть более точными, то автору даётся немного времени на исправление, а если не исправит или выйдет время, то вернём деньги в полном объёме!
Да! На равне с готовыми студенческими работами у нас продаются услуги. Цены на услуги видны сразу, то есть Вам нужно только указать параметры и сразу можно оплачивать.
Отзывы студентов
Ставлю 10/10
Все нравится, очень удобный сайт, помогает в учебе. Кроме этого, можно заработать самому, выставляя готовые учебные материалы на продажу здесь. Рейтинги и отзывы на преподавателей очень помогают сориентироваться в начале нового семестра. Спасибо за такую функцию. Ставлю максимальную оценку.
Лучшая платформа для успешной сдачи сессии
Познакомился со СтудИзбой благодаря своему другу, очень нравится интерфейс, количество доступных файлов, цена, в общем, все прекрасно. Даже сам продаю какие-то свои работы.
Студизба ван лав ❤
Очень офигенный сайт для студентов. Много полезных учебных материалов. Пользуюсь студизбой с октября 2021 года. Серьёзных нареканий нет. Хотелось бы, что бы ввели подписочную модель и сделали материалы дешевле 300 рублей в рамках подписки бесплатными.
Отличный сайт
Лично меня всё устраивает - и покупка, и продажа; и цены, и возможность предпросмотра куска файла, и обилие бесплатных файлов (в подборках по авторам, читай, ВУЗам и факультетам). Есть определённые баги, но всё решаемо, да и администраторы реагируют в течение суток.
Маленький отзыв о большом помощнике!
Студизба спасает в те моменты, когда сроки горят, а работ накопилось достаточно. Довольно удобный сайт с простой навигацией и огромным количеством материалов.
Студ. Изба как крупнейший сборник работ для студентов
Тут дофига бывает всего полезного. Печально, что бывают предметы по которым даже одного бесплатного решения нет, но это скорее вопрос к студентам. В остальном всё здорово.
Спасательный островок
Если уже не успеваешь разобраться или застрял на каком-то задание поможет тебе быстро и недорого решить твою проблему.
Всё и так отлично
Всё очень удобно. Особенно круто, что есть система бонусов и можно выводить остатки денег. Очень много качественных бесплатных файлов.
Отзыв о системе "Студизба"
Отличная платформа для распространения работ, востребованных студентами. Хорошо налаженная и качественная работа сайта, огромная база заданий и аудитория.
Отличный помощник
Отличный сайт с кучей полезных файлов, позволяющий найти много методичек / учебников / отзывов о вузах и преподователях.
Отлично помогает студентам в любой момент для решения трудных и незамедлительных задач
Хотелось бы больше конкретной информации о преподавателях. А так в принципе хороший сайт, всегда им пользуюсь и ни разу не было желания прекратить. Хороший сайт для помощи студентам, удобный и приятный интерфейс. Из недостатков можно выделить только отсутствия небольшого количества файлов.
Спасибо за шикарный сайт
Великолепный сайт на котором студент за не большие деньги может найти помощь с дз, проектами курсовыми, лабораторными, а также узнать отзывы на преподавателей и бесплатно скачать пособия.
Популярные преподаватели
Добавляйте материалы
и зарабатывайте!
Продажи идут автоматически
5167
Авторов
на СтудИзбе
437
Средний доход
с одного платного файла
Обучение Подробнее