Building machine learning systems with Python (779436), страница 36
Текст из файла (страница 36)
Classical and Metal areeven at almost 1.0 AUC. And indeed, also the confusion matrix in the following plotlooks much better now. We can clearly see the diagonal showing that the classifiermanages to classify the genres correctly in most of the cases. This classifier is actuallyquite usable to solve our initial task.If we would want to improve on this, this confusion matrix tells us quickly what tofocus on: the non-white spots on the non-diagonal places.
For instance, we have adarker spot where we mislabel Rock songs as being Jazz with considerable probability.To fix this, we would probably need to dive deeper into the songs and extract thingssuch as drum patterns and similar genre specific characteristics. And then—whileglancing over the ISMIR papers—we also have read about the so-called AuditoryFilterbank Temporal Envelope (AFTE) features, which seem to outperform MFCCfeatures in certain situations. Maybe we should have a look at them as well?The nice thing is that, only equipped with ROC curves and confusion matrices, weare free to pull in other experts' knowledge in terms of feature extractors withoutrequiring ourselves to fully understand their inner workings. Our measurementtools will always tell us, when the direction is right and when to change it.
Of course,being a machine learner who is eager to learn, we will always have the dim feelingthat there is an exciting algorithm buried somewhere in a black box of our featureextractors, which is just waiting for us to be understood.[ 217 ]Classification – Music Genre ClassificationSummaryIn this chapter, we totally stepped out of our comfort zone when we built a musicgenre classifier.
Not having a deep understanding of music theory, at first we failedto train a classifier that predicts the music genre of songs with reasonable accuracyusing FFT. But, then, we created a classifier that showed really usable performanceusing MFC features.In both the cases, we used features that we understood only enough to know howand where to put them into our classifier setup. The one failed, the other succeeded.The difference between them is that in the second case we relied on features thatwere created by experts in the field.And that is totally OK. If we are mainly interested in the result, we sometimessimply have to take shortcuts—we just have to make sure to take these shortcutsfrom experts in the specific domains.
And because we had learned how to correctlymeasure the performance in this new multiclass classification problem, we took theseshortcuts with confidence.In the next chapter, we will look at how to apply techniques you have learnedin the rest of this book to this specific type of data. We will learn how to use themahotas computer vision package to preprocess images using traditional imageprocessing functions.[ 218 ]Computer VisionImage analysis and computer vision have always been important in industrialand scientific applications. With the popularization of cell phones with powerfulcameras and Internet connections, images now are increasingly generated byconsumers.
Therefore, there are opportunities to make use of computer vision toprovide a better user experience in new contexts.In this chapter, we will look at how to apply techniques you have learned in the restof this book to this specific type of data. In particular, we will learn how to use themahotas computer vision package to extract features from images. These features canbe used as input to the same classification methods we studied in other chapters. Wewill apply these techniques to publicly available datasets of photographs.
We willalso see how the same features can be used on another problem, that is, the problemof finding similar looking images.Finally, at the end of this chapter, we will learn about using local features. Theseare relatively new methods (the first of these methods to achieve state-of-the-artperformance, the scale-invariant feature transform (SIFT), was introduced in 1999)and achieve very good results in many tasks.Introducing image processingFrom the point of view of the computer, an image is a large rectangular array of pixelvalues.
Our goal is to process this image and to arrive at a decision for our application.The first step will be to load the image from disk, where it is typically stored in animage-specific format such as PNG or JPEG, the former being a lossless compressionformat, and the latter a lossy compression one that is optimized for visual assessmentof photographs. Then, we may wish to perform preprocessing on the images (forexample, normalizing them for illumination variations).[ 219 ]Computer VisionWe will have a classification problem as a driver for this chapter. We want to beable to learn a support vector machine (or other) classifier that can be trained fromimages.
Therefore, we will use an intermediate representation, extracting numericfeatures from the images before applying machine learning.Loading and displaying imagesIn order to manipulate images, we will use a package called mahotas. You can obtainmahotas from https://pypi.python.org/pypi/mahotas and read its manualat http://mahotas.readthedocs.org. Mahotas is an open source package (MITlicense, so it can be used in any project) that was developed by one of the authorsof this book. Fortunately, it is based on NumPy. The NumPy knowledge you haveacquired so far can be used for image processing. There are other image packages,such as scikit-image (skimage), the ndimage (n-dimensional image) module inSciPy, and the Python bindings for OpenCV.
All of these work natively with NumPyarrays, so you can even mix and match functionality from different packages to builda combined pipeline.We start by importing mahotas, with the mh abbreviation, which we will usethroughout this chapter, as follows:>>> import mahotas as mhNow, we can load an image file using imread as follows:>>> image = mh.imread('scene00.jpg')The scene00.jpg file (this file is contained in the dataset available on this book'scompanion code repository) is a color image of height h and width w; the image willbe an array of shape (h, w, 3). The first dimension is the height, the second is thewidth, and the third is red/green/blue. Other systems put the width in the firstdimension, but this is the convention that is used by all NumPy-based packages.
Thetype of the array will typically be np.uint8 (an unsigned 8-bit integer). These are theimages that your camera takes or that your monitor can fully display.Some specialized equipment, used in scientific and technical applications, can takeimages with higher bit resolution (that is, with more sensitivity to small variations inbrightness).
Twelve or sixteen bits are common in this type of equipment. Mahotascan deal with all these types, including floating point images. In many computations,even if the original data is composed of unsigned integers, it is advantageous toconvert to floating point numbers in order to simplify handling of rounding andoverflow issues.[ 220 ]Chapter 10Mahotas can use a variety of different input/output backends.Unfortunately, none of them can load all image formats that exist(there are hundreds, with several variations of each).
However,loading PNG and JPEG images is supported by all of them. Wewill focus on these common formats and refer you to the mahotasdocumentation on how to read uncommon formats.We can display the image on screen using matplotlib, the plotting library we havealready used several times, as follows:>>> from matplotlib import pyplot as plt>>> plt.imshow(image)>>> plt.show()As shown in the following, this code shows the image using the convention thatthe first dimension is the height and the second the width. It correctly handles colorimages as well. When using Python for numerical computation, we benefit from thewhole ecosystem working well together: mahotas works with NumPy arrays, whichcan be displayed with matplotlib; later we will compute features from images to usewith scikit-learn.[ 221 ]Computer VisionThresholdingThresholding is a very simple operation: we transform all pixel values above acertain threshold to 1 and all those below it to 0 (or by using Booleans, transformit to True and False).
The important question in thresholding is to select a goodvalue to use as the threshold limit. Mahotas implements a few methods for choosinga threshold value from the image. One is called Otsu, after its inventor. The firstnecessary step is to convert the image to grayscale, with rgb2gray in the mahotas.colors submodule.Instead of rgb2gray, we could also have just the mean value of the red, green,and blue channels, by callings image.mean(2). The result, however, would notbe the same, as rgb2gray uses different weights for the different colors to give asubjectively more pleasing result. Our eyes are not equally sensitive to the threebasic colors.>>> image = mh.colors.rgb2grey(image, dtype=np.uint8)>>> plt.imshow(image) # Display the imageBy default, matplotlib will display this single-channel image as a false color image,using red for high values and blue for low values.