CLUSTERING OF HIGH RESOLUTION UAV IMAGERY TO IDENTIFY ESSENTIAL PLANTS USING SOM NEURAL NETWORK

The use of high-resolution remote sensing image data is necessary to distinguish essential plants with other plants. This study uses image data taken using Unmanned Aerial Vehicle (UAV) to identify essential plants especially citronella and kaffir lime. To distinguish the structure of essential plants with other objects used texture features extracted by wavelet daubechies method. The features that have been ekstract, then is grouped based on the proximity feature with the Self Organizing Map (SOM) Neural Network. Thus, objects that have similar features will clump together. The tests were conducted on two groups of data sets, where the first group data consisted of plants, buildings and vacant lots, while the second group data consisted only of plants. The results of testing of the first data group shows that the techniques can recognize the citronella plants among other objects, especially building objects and bare land with purity of 0.862745 and Silhouette Coeficient of 0.5520671. While in the second data group, the value of purity and Silhouette Coeficient decreased to 0.737705 and 0.161028. However, from the test of the second data group still shows that the method used can distinguish citronella crops to other plants.


INTRODUCTION
Essential plants have a high commercial value because it produces essential oils that are widely used in various industries such as perfumes, cosmetics and medicines. Some of the essential plant cultivated include citronella, fragrant root, patchouli, cloves, and eucalyptus (Marlon Tanasale, 2012). Of the 200 varieties of essential plants, Indonesia has the potential of 40 types of plants and 15 species of which produce essential oils that become export commodities.
Essential plants spread almost in all regions in Indonesia, but it's difficult to identify the existence of this plant because the data collection has not been managed properly and difficult to validate the extent of existing plants. To maintain the availability of essential plants, it is necessary to monitor the distribution of essential plantation land. One of the most widely used methods for monitoring plants and conducting land mapping is by utilizing remote sensing technology.
Several studies have been conducted for mapping agricultural land using Landsat 8 OLI multispectral satellite data (Sun et al 2014;Song et al. 2017;Arango et al., 2016); Landsat ETM + (Yang et al, 2017), MODIS (Skakun et al;, Fusion of Landsat and MODIS (Zhu et al, 2017). This study shows that medium resolution satellite data can produce high accuracy.
However, the use of Landsat or MODIS satellite data that has a resolution of 30 m 2 or more can not detect crop types especially if in an area consisting of more than one type of plant. In addition, the planting of essential in Indonesia is done by smallholder farmers with not extensive land. Thus the use of medium resolution satellite data gives less accurate results (Dewi et al, 2016;Wu et al, 2017). To improve accuracy in mapping, highresolution satellite data such as IKONOS (Anchang et al, 2016, Pu andBell, 2017), IKONOS and WolrdView-2 (Pu and Landry, 2012) can be used. However, these two high resolution satellite data have an expensive price, so it is less effective if used for institutions or farmers with small capital.
Unmanned Aerial Vehicle (UAV) is a remote sensing technology with very high image resolution (centimeters) and can be made quickly and have a cheaper cost. With a very high resolution, enabling the identification process to be carried out accurately (Vasuki et al, 2014;Sánchez, 2014;Gevaert, 2017). Research conducted by Lu and He (2017) shows that UAV image can be used to distinguish grassland species in a particular area.
Based on these advantages, our study used UAV image to identify essesntial plants, especially for citronella and kaffir lime. However, feature extraction is required to distinguish the object of this plant with other plants or other objects such as buildings and bare land. Based on visual observations, essential plants can be distinguished from buildings and bare land using color features. But, this feature is less effective when used to distinguish essential plants with other plants such as rice, corn and trees. If observed, this plant has a different canopy and density. Therefore, the texture feature is possible to be used in the recognition process. On the other hand, the differences of texture will be seen when observed in a certain pixel number of image area called window. Thus the windowing technique is possible to be used in feature extraction. Here, the quite accurate method of extraction of windowbased texture feature is wavelet. The study by Abdolmaleki et al (2017) performed the extraction of spectral features on hyperspectral images with wavelet and produced a good recommendation on the process of detection of copper deposits. Research conducted by Bakhshipour et al (2017) also shows that feature extraction with wavelet can improve the effectiveness of weed detection processes in bit plants.
Furthermore, this research uses Self Organizing Map (SOM) algorithm for identification process. SOM is an algorithm that uses cluster for identification process and proven to give good result. The research for the identification of ornamental plants produced an accuracy of 98% (Rachmanda, 2013). While the accuracy produced in the identification of disease on Ethiopian Coffee plant is 90.07% (Mengistu et al., 2016), and about 90% accuracy on plant disease (Patil and Kumar, 2011).

METHOD
The process of identifying essential plants uses image data taken with UAV and cropped in accordance with the specified window area. This image is then transformed using Discrete Wavelet Transform 2D. The process continues with the extraction of energy features L1 and L2 using Wavelet Texture Analysis. Features obtained from this process are then used as inputs to the training process using SOM. This training process is done to get the best weight of SOM by adjusting the value of learning rate parameter and leaning rate variable so that the algorithm can perform clustering process optimally. The optimal weights in this training process are then used in the testing process to obtain the cluster label of each test data. The identification process is generally shown in Figure 1.

Data Preprocessing
The data used as the input is the UAV image taken around the pilot plant of the Essential Institute in Kesamben, Blitar. The captured image has a resolution of 11 cm and is stored in a TIF extension file. This extension is used to keep the geographic coordinates data of the image. An example of a captured UAV image is shown in Figure 2.

Figure 2 UAV image of study area
The image taken consist of several objects such as building (1), tree & corn (2), kaffir lime (3), rice field/grass (4), bare land (5), and citronella (6). For the preparation of the data will be used in the test, several sample images were cropped intosize of 30x30 pixels in each area of the six objects. The total sample data is 156 with the details of data shown in Table 1

Descrete Wavelet Transform
Discrete Wavelet Transform (DWT) is a wavelet transformation technique that relies on a convolution process to calculate fast transformation (Jatmiko, 2011). Wavelet transform can be easily developed in 2-dimensional cases for image processing applications called discrete wavelet transform 2D (2D DWT). 2D DWT is the 1-dimensional wavelet transform that is applied at the x-axis (row) and y (column).
The implementation scheme of the 2D DWT can be seen in Figure 3. Where p(n) and q(n) denotes a low pass filter and high pass filter repesentation. The result of filter level 2 and down sampling resulted in four sub-sections of the image of LL for low pass filter image on horizontal and vertical part, HH for high pass filter image on horizontal and vertical part, LH for low pass filter image in horizontal and high pass Filter in the vertical direction, and HL for low pass filter image in vertical direction and high pass filter in horizontal direction.

Daubechies Transform (dbN)
Daubechies wavelets are written with dbN where N is a Wavelet order with 2N filter length and number of vanishing moments (Gupta, 2015).
Db1 can be likened to Haar wavelet (mother wavelet). Daubechies wavelet transforms are the same as Haar wavelet transforms with computations using decomposition smoothing and subtraction through scalar products with signal loop. This wavelet type has a balance of frequency response but non-liner response form. Daubechies wavelets use windows overlapping, therefore high frequency coefficients spectorm describe all high frequency changes.
There are three methods to deal with edge problems in overlapping windows of Daubechies wavelet (Ian, 2001), namely: 1. Treating the data set as if the data is periodic. The initial sequence of data is repeated by following the end of the sequence and the end of the data is taken for the prefix. 2. Treating the data set as if the data reflects the data at the end. This means that the data is reflected from each end, as if the mirror is held until each end of the data sequence. 3. Gram-Schmidt orthoganalization calculates the scale and function of the special wavelet applied at the beginning and end of the data set.

Wavelet Texture Analysis
Wavelet Texture Analysis extracts the textural features from the detail wavelet coefficients (sub band) or sub-images of each magnification. This feature is the majority extracted from high-frequency sub-bands (HH). The approximate value of the sub-band coefficient is usually represented by the illumination of the image. Generally does not accommodate image texture features. The texture feature is obtained from the normalization of first energy (L1) or the second energy (L2)

Self Organizing Map (SOM)
Self-Organizing Map (SOM) is one of the algorithm of Neural Network (ANN). This algorithm makes it possible to mapping complex information in twodimensional space through an unsupervised learning process so that classification can be both automatic and effective (Teles et al., 2015). The training process of SOM algorithm is described as follows (Singh and Dixit, 2013 The testing process only calculate the distance between the weights with the data without updating the weights. The weights used are the best weights obtained from the training process.

Silhouette Coefficient
Silhouette coefficient was used to assess the quality of clusters based on internal criteria. The evaluation by calculate the similarity of each cluster member in the cluster (Kogan, 2006 is the average distance i with the other data on the same cluster, b(i) is the minimum value of the average distance i with other data on other cluster The silhouette coefficient value is -1 ≤ s (i) ≤ 1. The larger the value of silhouette coefficient, the better the clustering is done.

Purity
Purity is used to evaluate the clustering based on external criteria. To obtain the value of purity required actual class data in each record. Accuracy is obtained by devided the pixel which is labeled in accordance with the real situation with the total number of pixels (Deepa & Revathy, 2012). Purity can be calculated using equation 2.
Where Ω = {ω 1 , ω 2 ,...,ω k } is cluster set, C = {c 1 , c 2 ,…, c j } is class set, N is an amount of data. Purity value is ranging from 0 to 1. Result of clustering is said to be worse if the purity close to 0, and vice versa (Deepa & Revathy, 2012).

RESULT AND DISCUSSION
This study develop the application using Java programming language to perform testing of proposed method. Tests were conducted on two groups of data: 1. The complete group consists of some objects such as building, tree & corn, kaffir lime, rice field / grass, bare land, and citronella. 2. Group of plants consists of some objects such as tree & corn, kaffir lime, rice field/grass, and citronella.
This study evaluates the result of testing by calculate internal criteria (Silhouette Coefficient) and external criteria (Purity) to determine the performance of SOM.

Result of Complete Dataset Testing
The tests were performed using the best parameters of wavelet daubechies obtained from the training process, ie at daubechies level of 1, daubechies coefficient of 5, features of RGB + Energy L1. The test results using some of the learning rate values of the SOM algorithm are shown in Table 1. Tests in the range of values of 0.1 to 1 are performed to find the best learning rate. Table 1 shows the purity and silhouette coeficient have fluctuating values. The highest purity value and the silhouette coeficient lies in learning rate 0.5. So learning rate 0.5 is the best value used SOM algorithm. The clustering result of the complete dataset is shown in Table 2. The result shows that citronella is found in clusters 2 and 5, but most of the data is in cluster 2. In cluster 2, citronella is identified in the same cluster as tree/corn, kaffir Lime and building. However, the dominant class identified is citronella. Although paddy/ grass and citronella have almost the same physical shape, clustering results also show that the textures of these two plants can be distinguished because they are not found in the same cluster This difference is mainly because the image is taken when the rice is still in the early stages of growth and citronella aged is about 3-4 months.
Table 2 also shows that all kaffir lime data clustered on the same cluster. However kaffir lime does not appear as dominant cluster because the amount of data used in the test is very less. Further testing of kaffir lime with larger amounts of data needs to be done to get better results.

Result of Plants Dataset Testing
As well as testing on complete dataset, the plants dataset also uses the best parameters of wavelet daubechies, ie daubechies level of 3, daubechies coefficients of 4, features of RGB + Energy L1. The test result using some learning rate value of SOM algorithm is shown in Table 3. In this test, learning rate 0,9 and 1 is the best value with purity value equal to 0,737705, and silhouette coefficient equal to 0,013609. These results show that the complete dataset produces a decrease in the value of purity and silhouette. The clustering result of plant datasets in Table 4 shows that citronella data spreads in clusters 1, 2 and 5. While overall kaffir lime data collects on cluster 1. Although citronella is detected as dominant class, it has a small percentage especially in clusters 1 and 2. While in cluster 5, citronella can be distinguished quite well from the tree/corn.