Building machine learning systems with Python (779436), страница 37
Текст из файла (страница 37)
For natural images, a grayscaleis more appropriate. You can select it with:>>> plt.gray()Now the image is shown in gray scale. Note that only the way in which the pixelvalues are interpreted and shown has changed and the image data is untouched.We can continue our processing by computing the threshold value:>>> thresh = mh.thresholding.otsu(image)>>> print('Otsu threshold is {}.'.format(thresh))Otsu threshold is 138.>>> plt.imshow(image > thresh)[ 222 ]Chapter 10When applied to the previous screenshot, this method finds the threshold to be 138,which separates the ground from the sky above, as shown in the following image:Gaussian blurringBlurring your image may seem odd, but it often serves to reduce noise, which helpswith further processing.
With mahotas, it is just a function call:>>> im16 = mh.gaussian_filter(image, 16)[ 223 ]Computer VisionNotice that we did not convert the grayscale image to unsigned integers: we justmade use of the floating point result as it is. The second argument to the gaussian_filter function is the size of the filter (the standard deviation of the filter). Largervalues result in more blurring, as shown in the following screenshot:[ 224 ]Chapter 10We can use the screenshot on the left and threshold with Otsu (using the sameprevious code).
Now, the boundaries are smoother, without the jagged edges,as shown in the following screenshot:Putting the center in focusThe final example shows how to mix NumPy operators with a tiny bit of filtering to getan interesting result. We start with the Lena image and split it into the color channels:>>> im = mh.demos.load('lena')[ 225 ]Computer VisionThis is an image of a young woman that has been often for image processing demos.It is shown in the following screenshot:To split the red, green, and blue channels, we use the following code:>>> r,g,b = im.transpose(2,0,1)Now, we filter the three channels separately and build a composite image out of itwith mh.as_rgb.
This function takes three two-dimensional arrays, performs contraststretching to make each be an 8-bit integer array, and then stacks them, returning acolor RGB image:>>> r12 = mh.gaussian_filter(r, 12.)>>> g12 = mh.gaussian_filter(g, 12.)>>> b12 = mh.gaussian_filter(b, 12.)>>> im12 = mh.as_rgb(r12, g12, b12)Now, we blend the two images from the center away to the edges.
First, we need tobuild a weights array W, which will contain at each pixel a normalized value, which isits distance to the center:>>> h, w = r.shape # height and width>>> Y, X = np.mgrid[:h,:w][ 226 ]Chapter 10We used the np.mgrid object, which returns arrays of size (h, w), with valuescorresponding to the y and x coordinates, respectively. The next steps are as follows:>>> Y = Y - h/2. # center at h/2>>> Y = Y / Y.max() # normalize to -1 .. +1>>> X = X - w/2.>>> X = X / X.max()We now use a Gaussian function to give the center region a high value:>>> C = np.exp(-2.*(X**2+ Y**2))>>> # Normalize again to 0..1>>> C = C - C.min()>>> C = C / C.ptp()>>> C = C[:,:,None] # This adds a dummy third dimension to CNotice how all of these manipulations are performed using NumPy arrays and notsome mahotas-specific methodology.
Finally, we can combine the two images tohave the center be in sharp focus and the edges softer:>>> ringed = mh.stretch(im*C + (1-C)*im12)[ 227 ]Computer VisionBasic image classificationWe will start with a small dataset that was collected especially for this book. It hasthree classes: buildings, natural scenes (landscapes), and pictures of texts. There are30 images in each category, and they were all taken using a cell phone camera withminimal composition.
The images are similar to those that would be uploaded toa modern website by users with no photography training. This dataset is availablefrom this book's website or the GitHub code repository. Later in this chapter, we willlook at a harder dataset with more images and more categories.When classifying images, we start with a large rectangular array of numbers (pixelvalues). Nowadays, millions of pixels are common. We could try to feed all thesenumbers as features into the learning algorithm. This is not a very good idea. This isbecause the relationship of each pixel (or even each small group of pixels) to the finalresult is very indirect.
Also, having millions of pixels, but only as a small numberof example images, results in a very hard statistical learning problem. This is anextreme form of the P greater than N type of problem we discussed in Chapter 7,Regression. Instead, a good approach is to compute features from the image and usethose features for classification.Having said that, I will point out that, in fact, there are a few methods that do workdirectly from the pixel values. They have feature computation submodules insidethem. They may even attempt to learn good features automatically. These methodsare the topic of current research.
They typically work best with very large datasets(millions of images).We previously used an example of the scene class. The following are examples of thetext and building classes:[ 228 ]Chapter 10Computing features from imagesWith mahotas, it is very easy to compute features from images. There is a submodulenamed mahotas.features, where feature computation functions are available.A commonly used set of texture features is the Haralick. As with many methods inimage processing, the name is due to its inventor.
These features are texture-based:they distinguish between images that are smooth from those that are patterned, andbetween different patterns. With mahotas, it is very easy to compute them as follows:>>> haralick_features = mh.features.haralick(image)>>> haralick_features_mean = np.mean(haralick_features, axis=0)>>> haralick_features_all = np.ravel(haralick_features)The mh.features.haralick function returns a 4x13 array. The first dimension refersto four possible directions in which to compute the features (vertical, horizontal,diagonal, and the anti-diagonal).
If we are not interested in the direction specifically,we can use the average over all the directions (shown in the earlier code as haralick_features_mean). Otherwise, we can use all the features separately (using haralick_features_all). This decision should be informed by the properties of the dataset.In our case, we reason that the horizontal and vertical directions should be keptseparately.
Therefore, we will use haralick_features_all.There are a few other feature sets implemented in mahotas. Linear binary patternsare another texture-based feature set, which is very robust against illuminationchanges. There are other types of features, including local features, which we willdiscuss later in this chapter.With these features, we use a standard classification method such as logisticregression as follows:>>> from glob import glob>>> images = glob('SimpleImageDataset/*.jpg')>>> features = []>>> labels = []>>> for im in images:...labels.append(im[:-len('00.jpg')])...im = mh.imread(im)...im = mh.colors.rgb2gray(im, dtype=np.uint8)...features.append(mh.features.haralick(im).ravel())>>> features = np.array(features)>>> labels = np.array(labels)[ 229 ]Computer VisionThe three classes have very different textures.
Buildings have sharp edges and bigblocks where the color is similar (the pixel values are rarely exactly the same, butthe variation is slight). Text is made of many sharp dark-light transitions, with smallblack areas in a sea of white. Natural scenes have smoother variations with fractal-liketransitions. Therefore, a classifier based on texture is expected to do well.As a classifier, we are going to use a logistic regression classifier with preprocessingof the features as follows:>>> from sklearn.pipeline import Pipeline>>> from sklearn.preprocessing import StandardScaler>>> from sklearn.linear_model import LogisticRegression>>> clf = Pipeline([('preproc', StandardScaler()),('classifier', LogisticRegression())])Since our dataset is small, we can use leave-one-out regression as follows:>>> from sklearn import cross_validation>>> cv = cross_validation.LeaveOneOut(len(images))>>> scores = cross_validation.cross_val_score(...clf, features, labels, cv=cv)>>> print('Accuracy: {:.1%}'.format(scores.mean()))Accuracy: 81.1%Eighty-one percent is not bad for the three classes (random guessing wouldcorrespond to 33 percent).
We can do better, however, by writing our own features.Writing your own featuresA feature is nothing magical. It is simply a number that we computed from an image.There are several feature sets already defined in the literature. These often have theadded advantage that they have been designed and studied to be invariant to manyunimportant factors. For example, linear binary patterns are completely invariantto multiplying all pixel values by a number or adding a constant to all these values.This makes this feature set robust against illumination changes of images.However, it is also possible that your particular use case would benefit from a fewspecially designed features.A simple type of feature that is not shipped with mahotas is a color histogram.Fortunately, this feature is easy to implement. A color histogram partitions the colorspace into a set of bins, and then counts how many pixels fall into each of the bins.[ 230 ]Chapter 10The images are in RGB format, that is, each pixel has three values: R for red, G forgreen, and B for blue.