**1. Fourier Transform**

Fourier Transform is used to analyze the frequency characteristics of various filters.

A signal $x(t)=Asin(2\Pi ft)$ with f is the frequency of signal, and if its frequency domain is taken, we can see a spike at f.

If f is high (high frequency signal), the amplitude of signal varies so fast in short time.

If f is low (low frequency signal), the amplitude of signal varies so slowin short time.

*An image as a signal which is sampled in two directions X and Y. So the Fourier transform will be taken in both directions.*

*In image, edges and noises are high frequency contents in an image. If there is no much changes in amplitude, it is a low frequency component.*

**2D Discrete Fourier Transform (DFT)**is used to find the the frequency representation (frequency domain) of image.

**Fast Fourier Transform (FFT)**is used for calculation of DFT.

**1.1. Fourier Transform in openCV**

OpenCV provides the functions

**cv2.dft()**and

**cv2.idft()**for Fourier Transform. The result has 2 channels (real part and imaginary parts of the result). The input image should be converted to np.float32.

Using

**cv2.cartToPolar()**to calculate both magnitude and phase.

Apply and high pass filter for the image below:

**Figure: input image**

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 | import numpy as np import cv2 from matplotlib import pyplot as plt img = cv2.imread('opencv_logo.png',0) dft = cv2.dft(np.float32(img),flags = cv2.DFT_COMPLEX_OUTPUT) dft_shift = np.fft.fftshift(dft) #fourier transform magnitude_spectrum = 20*np.log(cv2.magnitude(dft_shift[:,:,0],dft_shift[:,:,1])) rows, cols = img.shape crow,ccol = rows/2 , cols/2 # high pass filter, create a mask, center square is 0, remaining is 1 mask = np.ones((rows,cols,2),np.uint8) mask[crow-30:crow+30, ccol-30:ccol+30] = 0 # apply mask and inverse DFT fshift = dft_shift*mask f_ishift = np.fft.ifftshift(fshift) img_back = cv2.idft(f_ishift) #magnitude calculation is based on channels img_back = cv2.magnitude(img_back[:,:,0],img_back[:,:,1]) plt.subplot(141),plt.imshow(img, cmap = 'gray') plt.title('Input Image'), plt.xticks([]), plt.yticks([]) plt.subplot(142),plt.imshow(magnitude_spectrum, cmap = 'gray') plt.title('Magnitude Spectrum'), plt.xticks([]), plt.yticks([]) plt.subplot(143),plt.imshow(img_back, cmap = 'gray') plt.title('Image after HPF'), plt.xticks([]), plt.yticks([]) plt.subplot(144),plt.imshow(img_back) plt.title('Result in JET'), plt.xticks([]), plt.yticks([]) plt.show() |

**Figure: output FFT, HPF**

**1.2. Performance Optimization of DFT**

Performance of DFT calculation is better for some array size: power of two, a product of 2’s, 3’s, and 5’s. If your array size is not satisfy the condition, you can modify the size of the array by padding zeros before calculating.

OpenCV provides a function, cv2.getOptimalDFTSize() to find the optimal size of the array

The new code:

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 | import numpy as np import cv2 from matplotlib import pyplot as plt img = cv2.imread('opencv_logo.png',0) rows, cols = img.shape print rows,cols nrows = cv2.getOptimalDFTSize(rows) ncols = cv2.getOptimalDFTSize(cols) print nrows, ncols nimg = np.zeros((nrows,ncols)) nimg[:rows,:cols] = img dft = cv2.dft(np.float32(nimg),flags = cv2.DFT_COMPLEX_OUTPUT) dft_shift = np.fft.fftshift(dft) #fourier transform magnitude_spectrum = 20*np.log(cv2.magnitude(dft_shift[:,:,0],dft_shift[:,:,1])) crow,ccol = nrows/2 , ncols/2 # high pass filter, create a mask, center square is 0, remaining is 1 mask = np.ones((nrows,ncols,2),np.uint8) mask[crow-30:crow+30, ccol-30:ccol+30] = 0 # apply mask and inverse DFT fshift = dft_shift*mask f_ishift = np.fft.ifftshift(fshift) nimg_back = cv2.idft(f_ishift) img_back = nimg_back[:rows,:cols] #magnitude calculation is based on channels img_back = cv2.magnitude(img_back[:,:,0],img_back[:,:,1]) plt.subplot(141),plt.imshow(img, cmap = 'gray') plt.title('Input Image'), plt.xticks([]), plt.yticks([]) plt.subplot(142),plt.imshow(magnitude_spectrum, cmap = 'gray') plt.title('Magnitude Spectrum'), plt.xticks([]), plt.yticks([]) plt.subplot(143),plt.imshow(img_back, cmap = 'gray') plt.title('Image after HPF'), plt.xticks([]), plt.yticks([]) plt.subplot(144),plt.imshow(img_back) plt.title('Result in JET'), plt.xticks([]), plt.yticks([]) plt.show() |

**1.3. Filters and Fourier Transform**

Take the Fourier transform of Laplacian, Sobel, ... you can understand what kind of kernel it is.

**Figure: Fourier transform of some filters**

*+ Low frequencies are near the origin*

+ High frequencies are away from the origin

+ High frequencies are away from the origin

From the figure above you know what frequency region each kernel blocks, and what region each kernel passes.

So mean filter, Gaussian is low pass filter, remaining are high pass filters.

**2. Template Matching**

Template Matching is a method for searching and finding the location of a template image (contains objects) in an image.

OpenCV provides cv2.matchTemplate() for this feature. It slides the template image over the input image and compares the template image with a patch of input image.

There are some comparison methods:

cv2.TM_CCOEFF

cv2.TM_CCOEFF_NORMED

cv2.TM_CCORR

cv2.TM_CCORR_NORMED

cv2.TM_SQDIFF

cv2.TM_SQDIFF_NORMED

It returns a gray-scale image, where each pixel denotes how much does the neighborhood of that pixel match with template.

**Note:**

The problem with template matching is that it only works in very controlled environments. It works perfectly if the template image was taken from the input image, but it won't work if the resolution is different or even if the image is a little bit turned.

If input image is of size (WxH) and template image is of size (wxh), output image will have a size of (W-w+1, H-h+1). In order to mark the region matching the template, use cv2.minMaxLoc() function to find the maximum or minimum location of the best matches then calculate top-left corner of rectangle and take (w,h) as width and height of the rectangle.

Make a demo to find the template image below:

**Figure: template image**

in the image:

**Figure: input image**

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 | import cv2 import numpy as np from matplotlib import pyplot as plt inimage = cv2.imread('toy-story.jpg') img = cv2.cvtColor(inimage,cv2.COLOR_BGR2GRAY) img2 = img.copy() template = cv2.imread('template.png',0) w, h = template.shape[::-1] img = img2.copy() # Apply template Matching res = cv2.matchTemplate(img,template,cv2.TM_CCOEFF_NORMED) min_val, max_val, min_loc, max_loc = cv2.minMaxLoc(res) #find rectangle top_left = max_loc bottom_right = (top_left[0] + w, top_left[1] + h) cv2.circle(inimage,max_loc, 10, (0,255,0), -1) cv2.rectangle(inimage,top_left, bottom_right, (0,0,255), 2) #plot plt.subplot(121),plt.imshow(res,cmap = 'gray') plt.title('Matching Result'), plt.xticks([]), plt.yticks([]) plt.subplot(122),plt.imshow(cv2.cvtColor(inimage, cv2.COLOR_BGR2RGB)) plt.title('Detected Point'), plt.xticks([]), plt.yticks([]) plt.suptitle('TM_CCOEFF_NORMED') plt.show() |

**Figure: red rectangle in output**

Make demo for image and template below:

**Figure: input image**

**Figure: template**

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 | import cv2 import numpy as np from matplotlib import pyplot as plt inimage = cv2.imread('stars.png') img = cv2.cvtColor(inimage,cv2.COLOR_BGR2GRAY) img2 = img.copy() template = cv2.imread('star.png',0) w, h = template.shape[::-1] img = img2.copy() # Apply template Matching ress = cv2.matchTemplate(img,template,cv2.TM_CCOEFF_NORMED) threshold = 0.8 loc = np.where( ress >= threshold) for pt in zip(*loc[::-1]): cv2.circle(inimage,pt, 10, (0,255,0), -1) cv2.rectangle(inimage, pt, (pt[0] + w, pt[1] + h), (0,0,255), 2) #plot plt.subplot(111),plt.imshow(cv2.cvtColor(inimage, cv2.COLOR_BGR2RGB)) plt.title('Detected Point'), plt.xticks([]), plt.yticks([]) plt.suptitle('TM_CCOEFF_NORMED') plt.show() |

**Figure: output**

**3. Hough transform**

Hough Transform is a popular technique to detect any shape. It can detect the shape even if it is broken or distorted a little bit.

In order to use this method you have to represent that shape in mathematical form.

**3.1 Hough Line Transform**

Let 's consider to detect a line.

In Cartesian coordinates: $y=mx+c$

In Polar coordinates $\rho =xcos\theta +ysin\theta$, line can be represented in these two terms $(\rho,\theta)$

**Figure: a line in Polar coordinates**

Size of array depends on the accuracy. If the accuracy of angles is 1 degree then number of columns is 180. If the accuracy of distance is 1 pixel, because the maximum distance of $\rho$ is the diagonal length of the image then number of rows is diagonal length of the image.

Consider a 100x100 image with a horizontal line at the middle. Do steps:

- Take the first point of the line with (x,y). Substitute (x,y) to $\rho =xcos\theta +ysin\theta$, for each value of $\theta$=0,1,2,....,180, we have one $\rho$. For every $(\rho,\theta)$ pair, you increment value by one in our accumulator in $(\rho,\theta)$ cells. So the cell (50,90) = 1 along with some other cells.

- Take the second point on the line. Do the same as above. Increment the the values in the cells corresponding to $(\rho,\theta)$. This time, the cell (50,90) = 2.

- Continue this process for every point on the line. At each point, the cell (50,90) will be incremented, while other cells may or may not be voted up.

- At the end, the cell (50,90) will have maximum votes which says, there is a line in this image at distance 50 from origin and at angle 90 degrees.

**Figure: Hough Line Transform animation**

*with parameters:*

+ First Input image should be a binary image.

**Note:**apply threshold or use canny edge detection before finding applying Hough transform.

*+*Second and third parameters are $\rho$ and $\theta$ accuracies respectively.

+ Fourth argument is the threshold, which means minimum vote it should get for it to be considered as a line.

**Note:**number of votes depend upon number of points on the line. So it represents the minimum length of line that should be detected.

+ It returns an array of $(\rho,\theta)$ values.Each $(\rho,\theta)$ is a line.

Let 's make a demo to find lines for image below:

**Figure: input image**

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 | import cv2 as cv import numpy as np import random img = cv.imread('images/lines.png') gray = cv.cvtColor(img,cv.COLOR_BGR2GRAY) ret,thresh = cv.threshold(gray,150,255,cv.THRESH_BINARY_INV) lines = cv.HoughLines(thresh,1,5*np.pi/180,100) for line in zip(*lines[::-1]): b1 = random.randrange(1,255) g1 = random.randrange(1,255) r1 = random.randrange(1,255) rho,theta = line[0] a = np.cos(theta) b = np.sin(theta) x0 = a*rho y0 = b*rho x1 = int(x0 + 1000*(-b)) y1 = int(y0 + 1000*(a)) x2 = int(x0 - 1000*(-b)) y2 = int(y0 - 1000*(a)) cv.line(img,(x1,y1),(x2,y2),(b1,g1,r1),2) cv.imshow('img',img) cv.waitKey(0) cv.destroyAllWindows() |

**Figure: output using**

**Hough Line Transform**

**3.2 Probabilistic Hough Transform**

Probabilistic Hough Transform is an optimization of Hough Transform. It doesn't take all the points into consideration. It only takes a random subset of points and that is sufficient for line detection. We just have to decrease the threshold.

OpenCV provides cv2.HoughLinesP(), it is implemented based on Robust Detection of Lines Using the Progressive Probabilistic Hough Transform.

+ minLineLength - Minimum length of line. Line segments shorter than this are rejected.

+ maxLineGap - Maximum allowed gap between line segments to treat them as single line.

+ It directly returns the two endpoints of lines.

Reuse the image above.

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 | import cv2 import numpy as np img = cv2.imread('linen.png') gray = cv2.cvtColor(img,cv2.COLOR_BGR2GRAY) edges = cv2.Canny(gray,50,150,apertureSize = 3) #using MORPH_CLOSE to extend line after using Canny kernel = np.ones((5,5),np.uint8) closing = cv2.morphologyEx(edges, cv2.MORPH_CLOSE, kernel) lines = cv2.HoughLinesP(closing,1,np.pi/180,120,minLineLength=100,maxLineGap=1);print(lines) for line in zip(*lines[::-1]): x1,y1,x2,y2 = line[0] cv2.line(img,(x1,y1),(x2,y2),(0,255,0),2) cv2.imshow('detected lines',img) cv2.waitKey(0) cv2.destroyAllWindows() |

**Figure: output using**

**Probabilistic Hough Transform**

**3.3 Hough Circle Transform**

A circle is represented by mathematical form;

$(x-x_{center}^{2})+(y-y_{center}^{2})=r^{2}$

where:

$(x_{center},y_{center})$ : center of circle

$r$ : radius of circle

We have 3 parameters $(x_{center},y_{center}), r$, so we need a 3D accumulator for hough transform. It is ineffective so Hough Gradient Method is used. This method uses the gradient information of edges.

OpenCV provides cv.HoughCircles() to detect circles in image. Just modify 2 parameters to achieve the accuracy.

*+*param1: First method-specific parameter. In case of HOUGH_GRADIENT , it is the higher threshold of the two passed to the Canny edge detector (the lower one is twice smaller).

+ param2 : Second method-specific parameter. In case of HOUGH_GRADIENT , it is the accumulator threshold for the circle centers at the detection stage. The smaller it is, the more false circles may be detected. Circles, corresponding to the larger accumulator values, will be returned first.

Let 's make a demo for image below:

**Figure: input image**

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 | import numpy as np import cv2 as cv import cv2 img = cv.imread('circles.png') gray_img = cv.cvtColor(img,cv.COLOR_BGR2GRAY) circles = cv.HoughCircles(gray_img,cv2.cv.CV_HOUGH_GRADIENT,1,20, param1=300,param2=20,minRadius=0,maxRadius=0) circles = np.uint16(np.around(circles)) for i in circles[0,:]: # draw the outer circle cv.circle(img,(i[0],i[1]),i[2],(255,255,0),2) # draw the center of the circle cv.circle(img,(i[0],i[1]),2,(0,0,255),3) cv.imshow('detected circles',img) cv.waitKey(0) cv.destroyAllWindows() |

**4. Distance Transform**

This is an operator applied to binary images. The result is a gray-scale image that is the same size to the input image, except that the gray level intensities of points inside foreground regions are changed to show the distance to the closest boundary.

Imagine that you simultaneously start a fire at all points on the boundary of a foreground region in a binary image and letting the fire burn its way into the interior. If we then label each point in the interior with the amount of time that the fire took to reach that point, then we have effectively computed the distance transform of that region.

There are several kinds of distance transform, depending on the method to calculate the distance between pixels such as Euclidean Distance, City Block Distance, Chessboard Distance.

**Figure: distance transform using Chessboard Distance (source)**

**Figure: some kinds of distance transform**

**Figure: input image**

1 2 3 4 5 6 7 8 9 | import cv2 as cv import cv2 import numpy as np import random img = cv.imread('images/disttrans.png') gray = cv.cvtColor(img,cv.COLOR_BGR2GRAY) ret,thresh = cv.threshold(gray,127,255,cv.THRESH_BINARY) dist_transform = cv.distanceTransform(thresh,cv2.cv.CV_DIST_L2,5) cv.imwrite('dist_transform.png',dist_transform) |

**Figure: the interior is brighter than outside**

**5. Image Segmentation with Watershed Algorithm**

Any grayscale image can be viewed as a topographic surface where high intensity denotes hills and low intensity denotes valleys.

**Figure: grayscale image**

**Figure: grayscale image can be viewed as a topographic surface (source)**

**Figure: water starts filling valleys and barriers are on the peak of hills (source)**

**Figure: barriers are in red -**

**segmentation result**

+ Label the region which is the foreground or object with one color (or intensity)

+ Label the region which is background or non-object with another color.

+ Label the region which is undetermined with 0.

Then our marker will be updated with the labels we gave, and the boundaries of objects will have a value of -1.

**6. Application of Distance Transform and Watershed Algorithm**

Segment the touching coins in image below:

**Figure: the touching coins**

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 | import numpy as np import cv2 as cv import cv2 from matplotlib import pyplot as plt img = cv.imread('images/coins.png') gray = cv.cvtColor(img,cv.COLOR_BGR2GRAY) ret, thresh = cv.threshold(gray,0,255,cv.THRESH_BINARY_INV+cv.THRESH_OTSU) #create marker marker = np.ones((gray.shape),np.int32) #remove small dots and get sure background kernel = np.ones((3,3),np.uint8) sure_bg = cv.morphologyEx(thresh, cv.MORPH_OPEN, kernel, iterations = 2) #it is ok to use sure background above or can use this sure background #sure_bg = cv.dilate(opening,kernel,iterations=3) # Finding sure foreground area dist_transform = cv.distanceTransform(thresh,cv2.cv.CV_DIST_L2,5) ret, sure_fg = cv.threshold(dist_transform,0.7*dist_transform.max(),255,0) sure_fg = np.uint8(sure_fg) #markers for unknown areas=sure_bg-sure_fg unknown = cv.subtract(sure_bg,sure_fg) marker[unknown==255] = 0 #markers for sureforeground contours, hierarchy = cv.findContours(sure_fg, cv.RETR_CCOMP, cv.CHAIN_APPROX_SIMPLE) idx = len(contours)+1 for cnt in contours: cv.drawContours(marker, [cnt], 0, idx, -1 ) idx = idx -1 # apply watershed cv.watershed(img, marker) img[marker == -1] = [255,0,0] #display segmentation result plt.subplot(111),plt.imshow(img) plt.title('Result in JET'), plt.xticks([]), plt.yticks([]) plt.show() |

**Figure: segmented result using Watershed**

## No comments:

## Post a Comment