Top Links
Journal of Biochemistry and Biophysics
ISSN: 2576-7623
Detection of Early Blight Using K-Means Clustering
Copyright: © 2022 Lijalem T. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Related article at Pubmed, Google Scholar
Early blight is one of the major diseases of tomatoes that affects the leaves and fruit quality. Detection and estimation of the disease severity are performed using the visual observation method. Visual detection requires significant time for visual inspection of a large cultivated area. Thus, image processing techniques have proven to be an effective method as compared to visual analysis. In this study, digital image processing methods and techniques were used to detect early blight of tomato, estimate the disease severity, and classify tomato leaves. Totally, 198 infected plants were randomly taken from the Haramaya University research site "Raree" at four different times. Diseased potato leaf images were captured, resized, and stored for experimentation. The stored images were processed using median filtering to remove noise while preserving useful features in an image and image enhancement. The RGB images were transformed to gray scale and CIELAB color space, and the k-means clustering was applied to estimate the disease severity of the potato leaves, and Otsu’s thresholding algorithm was applied to estimate the disease severity of both the detached and live leaves. MATLAB algorithms will be developed to determine the total area and infected lesion area of the leaf samples.
Keywords: Image processing, K-means clustering, Early Blight, Segmentation
There are a number of factors that limit tomato production and productivity. Diseases in the crop are among the major factors which affect production and quantity of crop yield. The most common diseases are early blight, late blight, and septoria leaf spot effect during the growing season, which reduces quality and tomato crop yield [1].
Early blight (EB) is one of the most common destructive foliage tomato diseases caused by the fungus Alternaria solani and major yield losses in most potato-growing areas worldwide. On the leaves of a tomato, early blight appears as rings with yellow holes, and it can also cause symptoms such as collar rot (basal stem lesions at the seedling stage), stem lesions on the adult plant, and fruit rot over a wide range of temperatures, i.e., 4-36 oC [2].
It is occurring early in the growth cycle, spreading quickly and causing severe damage. To detect disease timely and for effective control, the occurrence and development of early blight tomato (EBT) is important to identify the infectious on tomato leaves easily and improve the quality of crop.
Traditionally, direct observation methods were carried out to detect diseases on crop leaves. According to [3] the detection of severity of infected leaves has been done by the farmers using naked eye and it is also largely dependent on the level of experience of agronomists and farmers. However, this method has been shown to be time-consuming, costly, inefficient for detection and difficult to monitor the large farm that are subject to multiple human errors. Additionally, in practice, farmers require continuous monitoring methods to efficiently identify tomato disease [1].
Image processing techniques using MATLAB software for detection and segmentation of early blight of tomato leaves are fast, less expensive, require less effort, and are more accurate. Thus, image processing techniques are present in this work. The image processing technique has many important and effective applications in the field of agriculture for the detection of foliar diseases. For example, in identifying the types of disease, finding the shape of the affected area, detecting the edges of the diseased leaf, calculating the diseased ratio, separating the layers of target images, and determining the color of affected areas [4].
Image processing techniques have been found increasingly useful in the fruit industry, especially for applications in quality inspection and defect sorting applications.
In this paper, an accurate disease detection method for the diagnosis of the early blight on tomato leaves is developed. Automatic detection framework for early blight of tomato leave disease consists of the following steps: image acquisition, preprocessing, and segmentation. First, the tomato leaf image was acquired using a digital camera and preprocessed to remove noise and enhance the contrast of the tomato leaf image. The image RGB color model was converted into the CIELAB color model that the color transformation allows us to measure the visual difference present in the RGB image. The CIELAB system is device independent, which is defined by the CIE to classify color according to human vision. The converted L*a*b* color values are returned as a numeric array of the same size as the input. L*, a*, and b* stand for lightness, red/green value, and blue/yellow value, respectively. Secondly, the preprocessed image k-means clustering was applied to segment the region of interest into two clusters, and Otsu’s thresholding was used for segmentation.
The experiment was conducted at Haramaya University's "Raree" research station. Haramaya University (HU) is in the East Hararghe zone of Oromia Regional State, Ethiopia. The university is located at 9o26’N latitude and 42o3’ longitude at 9o 26'N latitude. The altitude on the campus varies from 1980 to 2000 meters above sea level (m.a.s.l.). The mean annual precipitation is 780mm and the mean annual maximum and minimum temperatures are 23.4 oC and 8.25 oC respectively [5].
A tomato variety named “money maker” was used as a test crop. This variety was released by the tomato improvement Program of Haramaya University in 2010. The variety was released by the university for mid to high altitude areas of eastern Ethiopia. Thirty-three equal sized pots were utilized to conduct the experiment. To maintain the uniformity of extraneous factors, the tomato was grown in greenhouse “Raree” in the pots filled with soil of similar proportions.
The pathogen was inoculated on thirty days old tomato planted on the pot. On the eleventh day after the inoculation, TLB disease began to appear. This means the diagnosis and sample collection was beginning on the forty-one days old tomato plant after the inculcation of the pathogens and application of fungicides of different volume. The images were captured using digital camera and stored for later processing using personal computer and MATLAB software is used for implementation.
The first images were taken on the forty-first day following the occurrence of early blight of tomato leaf disease. This was repeated every eight days for two consecutive weeks. In this work, thirty-three infected tomato plants images were randomly taken from the pool of pots on which the plants were grown. Within a week, two images of each tomato plant (one from the bottom and one from the middle) and 66 images per week were taken to determine the progress of the disease and classification. In three weeks, 198 infected plants of tomato samples were randomly taken from 11 groups.
The basic steps for detection, estimation of severity and classification of tomato plant disease from image acquisition to segmentation procedure is shown in the Figure 1 below.
The tomato leaf samples were acquired by placing the samples on a white background(paper) using digital camera. These images were stored in JPEG (Joint Photographic Expert Group) format with 256 × 256 size.
Following acquisition, removal of noise, enhancement of poor resolutions of images, removal of unwanted background, and enhancing the quality of the image using median filtering and contrast limited adaptive histogram equalization were conducted. Then, the captured RGB color image was transformed into CIELA or Lab space. CIELAB are color independent space model, derived from human perception, and the RGB is color dependent space model. The LAB color is designed to approximate human vision and it aspires to perceptual uniformity and its L-component closely matches human perception of lightness. For the transforming formulas from RGB to LAB color model the equations below were used [10].
The detection of the damaged leaf area was done by avoiding defoliation using A-channel of the CIELAB color model by developing the appropriate Mat Lab algorithm. A-channel is selected because it’s easy for human perception. The other advantage for using CIELAB color model is that the leaf veins are not mistaken as or with the damage.
Image segmentation means partitioning of an image into various parts of the similar features or having some similarity measure. The segmentation can be done using various methods like the otsu’s method, and K-means clustering unsupervised learning [7], [6] The A-channel is thresholded using otsu’s thresholding method [8] to segment the damaged and normal parts of the leaves. Thresholding is the transformation of an input image f to an output (segmented) binary image G as follows:
where, T is the threshold value, G (i, j) = 1, for image elements of objects and G (i, j) = 0, for image elements of the background.
K-Means clustering algorithm is an unsupervised algorithm and it is used to segment images based on the L*a*b* color space. K-Means clustering algorithm is an unsupervised algorithm and it was used to segment the region interest from the background and it clusters or partitions the given image into K-clusters or parts based on the K-centroids. K is the value which is used to create number of segments for clustering. Objects can be clustered to one of k groups primarily chosen and the cluster membership is determined by calculating the centroid for each group and assigning each object to the group with the closest centroid. This approach minimizes the overall within-cluster dispersion by iterative reallocation of cluster members [9]. This algorithm aims at minimizing an objective function known as squared error function expressed in equation (3). In this study for k = 2, two different centers of the clusters are created based on its distance of each color pixels from the center of the cluster was computed. Let be the set of data points and be the set of cluster centers. Then the objective function J(V) is evaluated as
Where ║Xi-Vj║ is the Euclidean distance between xi and vj ‘ki’ is the number of data points in ith cluster. ‘k’ is the number of cluster centers. The new cluster center is calculated based on the color intensity value of the ith data point (pixel) and the jth centroid using Euclidean distance equation (4).
Read the input image of tomato leaf.
Transform image from RGB to LAB color space;
Classify colors using k-means clustering in ‘AB’ space using Euclidean distance.
Label each pixel in the image from the results of k-means.
Generate the images that segment the image by color.
Select disease containing segment.
In this study, early blight of tomato leave was acquired, and MATLAB 2013a Software have been used to implement image enhancement. In the original RGB early blight tomato leaf images were resized to a standard format and converted into gray scale image using the MATLAB built-in function rgb2gray. The acquired tomato leaf images are in jpeg format which is in true color format (24-bit color format). The color images are now decomposed into red, green, and blue color component for disease identification based on color histogram. According to the Figure 2 below red image shows better brightness than blue and green images due its high intensity. Figure 2 below was taken by screen shot in MATLAB program.
The original ‘leaf.jpg’ converted into a gray scale image is shown in Figure 2.
The acquired images are enhanced using contrast enhancement method with a contrast limit (range 0 to 1) to increase the local contrast of the tomato leaf image for easily identifying. Figure 6 shows the enhanced images using CLAHE. Before contrast adjustment, the contrast between background and the objects was very low. After adjustment, full dynamic range of histogram of the image was stretched to increase the contrast of the image ( Figure 3).
The enhanced image is transformed into the LAB color space, and it is stored in only two channels (A and B components) and it causes reduced processing time for the detect segmentation. The LAB color helps to approximate human vision and it aspires to perceptual uniformity. The L-component closely matches human perception of lightness, and it was used to detect the lesion region of tomato leaf. The transformed CIELAB color model was used to identify lesion part of the tomato leaf and is clearly separated L-component and A-component from the normal as shown below in Figure 5. As shown in the figure a two-phase color conversion was made one from RGB into gray scale and the other from RGB to LAB to extract the area.
The Lab color space allows us to quantify the color utilizing an independent color space.
After color Lab space transforming, k-means clustering was applied to separate groups of tomato leaf, and it treats each image as having a location in space. The diseased area of the leaf is cluster separately using k-means clustering and it enabled us to obtain the area of the diseased and normal region by avoiding the defoliation parts. Tomato leaf image was clustered in to two objects in cluster 1 and objects using k -means clustering with values. The total area was determined by converting the RGB image into gray scale. Moreover, the RGB image was transformed into the LAB color model and Otsu’s thresholding was made on the A-channel of the color model to evaluate area of the normal part of the leaf (Figure 6 ). The clustered tomato leaf image using k -means clustering is shown in Figure 6 below.
Then, a thresholding technique based on Otsu algorithm was applied to get the binary image with black as background and white as four grounds and the total area of the early blight of tomato leaf. Following are the results of segmented images using thresholding techniques used to determine the total area of tomato leaf image.
The diseased area is computed using the original image and the segmented image. This gives the degree of infection in tomato. ML and LL stands for detached and live tomato leaves for the first round of sampling, respectively. Similarly, the other leaves’ areas were extracted in the same way and their area feature results were used to estimate the disease severity. The area of the sample leaves was extracted using MATLAB Software, and the values are tabulated for MLs and for LLs. Table 1 shows the result for all diseases.
Total tomato area = 27418q. pixel units
Disease portion area = 3103 sq. pixel units
Infection % = (total tomato area/ disease portion) *100
= (27418/3103) *100
= 11.3173%
In this work, a method has been developed to detect the EBT images by means of automatic image segmentation techniques. The samples were collected from HrU research area” Raree”. First, the early blight of tomato leaf images was acquired and pre-processed to enhance the contrast of the images. The red, green blues (RGB) images was transformed into L*a*b space and then, K-means Clustering algorithm is used to segment the disease part of the leaves with two different cluster processes. After all iteration the final image is displayed in the second cluster. The centroid weight is calculated to identify accurate portion and also to calculate infection percentage. The intensity value of each centroid is plot based on red, green, and blue pixels for infected portion. The mathematical formula is used to calculate the infection percentage. With regards to future work different segmentation algorithm and different techniques to find infected area is used to obtain more accurate result. In this work the proposed work is limited to K-means clustering segmentation and mathematical formulations.
Figure 1: Frame diagram to detect and classify early blight of tomato disease |
Figure 2: The captured image can be represented in Red, Green, and Blue (RGB). a) original image, b) Red image which has the highest brightness, c) Blue image, d) Green image |
Figure 3: Indicates that the contrast of enhanced image and the visuality of the image increased.Where a, histogram of ienhanced image, b, intensity of the enhanced image c, intensity of the enhanced image in PDF. d, intensity of the enhanced image in CDF |
Figure 4: (a) the input image, (b) enhanced stretched image, (c) the gray scale image histogram d) the enhanced stretched image in histogram. The horizontal axis of a histogram represents the intensity level of color (0-255), while vertical axis shows the number of pixels of each color |
Figure 5: The image colour transformation result |
Figure 6: Infected lesion Segmentation using k-means clustering algorithm (DL) |
Figure 7: Extraction of normal tomato leaf area to estimate the disease severity of the leaf |
No. |
Tomato |
Total |
Diseased Area |
Difference |
% Difference |
1. | ML |
27418 |
3103 |
0.113173827 |
11.31738274 |
2. | ML |
20508 |
3083 |
0.150331578 |
15.03315779 |
3. | LL |
18198 |
2189 |
0.120287944 |
12.02879437 |
4. | LL |
15654 |
4483 |
0.286380478 |
28.63804778 |
5. | ML |
18610 |
1333 |
0.071628157 |
7.16281569 |
6. | LL |
19679 |
2325 |
0.118146247 |
11.81462473 |
7. | ML |
18974 |
2295 |
0.120954991 |
12.0954991 |
8. | ML |
18602 |
1577 |
0.084775831 |
8.477583056 |
9. | LL |
11316 |
1855 |
0.163927183 |
16.39271828 |
10. | ML |
18726 |
2759 |
0.147335256 |
14.73352558 |
11. | LL |
20139 |
375 |
0.018620587 |
1.862058692 |
12. | ML |
25004 |
3868 |
0.154695249 |
15.46952488 |
13. | LL |
15897 |
4735 |
0.297854941 |
29.78549412 |
14. | ML |
16166 |
286 |
0.017691451 |
1.769145119 |
15. | LL |
14457 |
3937 |
0.272324825 |
27.23248253 |
16. | ML |
21027 |
1550 |
0.073714748 |
7.371474771 |
17. | LL |
20130 |
1195 |
0.059364133 |
5.936413313 |
18. | ML |
16415 |
5565 |
0.33901919 |
33.90191898 |
19. | LL |
10189 |
2544 |
0.249681029 |
24.96810286 |
20. | ML |
21844 |
6359 |
0.291109687 |
29.11096869 |
21. | LL |
18687 |
3918 |
0.209664473 |
20.96644726 |
22. | LL |
18687 |
3918 |
0.209664473 |
20.96644726 |
23. | LL |
20275 |
1164 |
0.057410604 |
5.741060419 |
24. | LL |
18439 |
2034 |
0.11030967 |
11.03096697 |
25. | ML |
17476 |
2175 |
0.124456397 |
12.44563973 |
26. | ML |
20686 |
773 |
0.037368268 |
3.736826839 |
27. | ML |
21314 |
1320 |
0.061931125 |
6.193112508 |
28. | ML |
18523 |
2553 |
0.137828645 |
13.78286455 |
29. | ML |
19277 |
1869 |
0.09695492 |
9.695492037 |
30. | ML |
14478 |
2762 |
0.190772206 |
19.07722061 |
31. | ML |
15875 |
1299 |
0.081826772 |
8.182677165 |
32. | ML |
16881 |
1657 |
0.098157692 |
9.815769208 |
Table 1: Results of total and diseased area the tomato leaf
ML = Middle tomato leaves and LL = Lower tomato leaves. Entropy value of different segmentation methods