Building machine learning systems with Python (779436), страница 2
Текст из файла (страница 2)
After all, there are millions of booksprinted every year, which are read by millions of readers. And then there is this bookread by you. One could also argue that a couple of machine learning algorithmsplayed their role in leading you to this book—or this book to you. And we, theauthors, are happy that you want to understand more about the hows and whys.Most of the book will cover the how. How has data to be processed so that machinelearning algorithms can make the most out of it? How should one choose the rightalgorithm for a problem at hand?Occasionally, we will also cover the why. Why is it important to measure correctly?Why does one algorithm outperform another one in a given scenario?We know that there is much more to learn to be an expert in the field. After all, weonly covered some hows and just a tiny fraction of the whys. But in the end, we hopethat this mixture will help you to get up and running as quickly as possible.What this book coversChapter 1, Getting Started with Python Machine Learning, introduces the basic idea ofmachine learning with a very simple example.
Despite its simplicity, it will challengeus with the risk of overfitting.Chapter 2, Classifying with Real-world Examples, uses real data to learn aboutclassification, whereby we train a computer to be able to distinguish differentclasses of flowers.Chapter 3, Clustering – Finding Related Posts, teaches how powerful the bag ofwords approach is, when we apply it to finding similar posts without really"understanding" them.[ vii ]PrefaceChapter 4, Topic Modeling, moves beyond assigning each post to a single cluster andassigns them to several topics as a real text can deal with multiple topics.Chapter 5, Classification – Detecting Poor Answers, teaches how to use the bias-variancetrade-off to debug machine learning models though this chapter is mainly on using alogistic regression to find whether a user's answer to a question is good or bad.Chapter 6, Classification II – Sentiment Analysis, explains how Naïve Bayes works, andhow to use it to classify tweets to see whether they are positive or negative.Chapter 7, Regression, explains how to use the classical topic, regression, in handlingdata, which is still relevant today.
You will also learn about advanced regressiontechniques such as the Lasso and ElasticNets.Chapter 8, Recommendations, builds recommendation systems based on costumerproduct ratings. We will also see how to build recommendations just from shoppingdata without the need for ratings data (which users do not always provide).Chapter 9, Classification – Music Genre Classification, makes us pretend that someonehas scrambled our huge music collection, and our only hope to create order is to let amachine learner classify our songs. It will turn out that it is sometimes better to trustsomeone else's expertise than creating features ourselves.Chapter 10, Computer Vision, teaches how to apply classification in the specific contextof handling images by extracting features from data. We will also see how thesemethods can be adapted to find similar images in a collection.Chapter 11, Dimensionality Reduction, teaches us what other methods exist that can helpus in downsizing data so that it is chewable by our machine learning algorithms.Chapter 12, Bigger Data, explores some approaches to deal with larger data by takingadvantage of multiple cores or computing clusters.
We also have an introduction tousing cloud computing (using Amazon Web Services as our cloud provider).Appendix, Where to Learn More Machine Learning, lists many wonderful resourcesavailable to learn more about machine learning.What you need for this bookThis book assumes you know Python and how to install a library using easy_install orpip. We do not rely on any advanced mathematics such as calculus or matrix algebra.[ viii ]PrefaceWe are using the following versions throughout the book, but you should be finewith any more recent ones:• Python 2.7 (all the code is compatible with version 3.3 and 3.4 as well)• NumPy 1.8.1• SciPy 0.13• scikit-learn 0.14.0Who this book is forThis book is for Python programmers who want to learn how to perform machinelearning using open source libraries.
We will walk through the basic modes ofmachine learning based on realistic examples.This book is also for machine learners who want to start using Python to build theirsystems. Python is a flexible language for rapid prototyping, while the underlyingalgorithms are all written in optimized C or C++. Thus the resulting code is fast androbust enough to be used in production as well.ConventionsIn this book, you will find a number of styles of text that distinguish betweendifferent kinds of information. Here are some examples of these styles, and anexplanation of their meaning.Code words in text, database table names, folder names, filenames, file extensions,pathnames, dummy URLs, user input, and Twitter handles are shown as follows:"We then use poly1d() to create a model function from the model parameters."A block of code is set as follows:[aws info]AWS_ACCESS_KEY_ID = AAKIIT7HHF6IUSN3OCAAAWS_SECRET_ACCESS_KEY = <your secret key>Any command-line input or output is written as follows:>>> import numpy>>> numpy.version.full_version1.8.1[ ix ]PrefaceNew terms and important words are shown in bold.
Words that you see on thescreen, in menus or dialog boxes for example, appear in the text like this: "Oncethe machine is stopped, the Change instance type option becomes available."Warnings or important notes appear in a box like this.Tips and tricks appear like this.Reader feedbackFeedback from our readers is always welcome.
Let us know what you think aboutthis book—what you liked or may have disliked. Reader feedback is important forus to develop titles that you really get the most out of.To send us general feedback, simply send an e-mail to feedback@packtpub.com,and mention the book title via the subject of your message. If there is a topic that youhave expertise in and you are interested in either writing or contributing to a book,see our author guide on www.packtpub.com/authors.Customer supportNow that you are the proud owner of a Packt book, we have a number of things tohelp you to get the most from your purchase.Downloading the example codeYou can download the example code files from your account at http://www.packtpub.com for all the Packt Publishing books you have purchased. If youpurchased this book elsewhere, you can visit http://www.packtpub.com/supportand register to have the files e-mailed directly to you.The code for this book is also available on GitHub at https://github.com/luispedro/BuildingMachineLearningSystemsWithPython.
This repository iskept up-to-date so that it will incorporate both errata and any necessary updatesfor newer versions of Python or of the packages we use in the book.[x]PrefaceErrataAlthough we have taken every care to ensure the accuracy of our content, mistakesdo happen. If you find a mistake in one of our books—maybe a mistake in the text orthe code—we would be grateful if you could report this to us. By doing so, you cansave other readers from frustration and help us improve subsequent versions of thisbook.
If you find any errata, please report them by visiting http://www.packtpub.com/submit-errata, selecting your book, clicking on the Errata Submission Formlink, and entering the details of your errata. Once your errata are verified, yoursubmission will be accepted and the errata will be uploaded to our website or addedto any list of existing errata under the Errata section of that title.To view the previously submitted errata, go to https://www.packtpub.com/books/content/support and enter the name of the book in the search field.
The requiredinformation will appear under the Errata section.Another excellent way would be to visit www.TwoToReal.com where the authors tryto provide support and answer all your questions.PiracyPiracy of copyright material on the Internet is an ongoing problem across all media.At Packt, we take the protection of our copyright and licenses very seriously.
If youcome across any illegal copies of our works, in any form, on the Internet, pleaseprovide us with the location address or website name immediately so that we canpursue a remedy.Please contact us at copyright@packtpub.com with a link to the suspectedpirated material.We appreciate your help in protecting our authors, and our ability to bring youvaluable content.QuestionsYou can contact us at questions@packtpub.com if you are having a problem withany aspect of the book, and we will do our best to address it.[ xi ]Getting Started with PythonMachine LearningMachine learning teaches machines to learn to carry out tasks by themselves.
It isthat simple. The complexity comes with the details, and that is most likely the reasonyou are reading this book.Maybe you have too much data and too little insight. You hope that usingmachine learning algorithms you can solve this challenge, so you started digginginto the algorithms. But after some time you were puzzled: Which of the myriadof algorithms should you actually choose?Alternatively, maybe you are in general interested in machine learning and forsome time you have been reading blogs and articles about it.
Everything seemedto be magic and cool, so you started your exploration and fed some toy data into adecision tree or a support vector machine. However, after you successfully appliedit to some other data, you wondered: Was the whole setting right? Did you get theoptimal results? And how do you know whether there are no better algorithms? Orwhether your data was the right one?Welcome to the club! Both of us (authors) were at those stages looking forinformation that tells the stories behind the theoretical textbooks about machinelearning. It turned out that much of that information was "black art" not usuallytaught in standard text books.
So in a sense, we wrote this book to our youngerselves. A book that not only gives a quick introduction into machine learning, butalso teaches lessons we learned along the way. We hope that it will also give you asmoother entry to one of the most exciting fields in Computer Science.[1]Getting Started with Python Machine LearningMachine learning and Python – a dreamteamThe goal of machine learning is to teach machines (software) to carry out tasks byproviding them a couple of examples (how to do or not do the task). Let's assumethat each morning when you turn on your computer, you do the same task ofmoving e-mails around so that only e-mails belonging to the same topic end up inthe same folder. After some time, you might feel bored and think of automating thischore.