Application of clustering approaches on yields

Yield data, from a total of 200+ on-farm diagnostic trial fields, was clustered using K-Means clustering. Only sites with maize were included in the cluster analysis as there is no way of accounting for crop type in the clustering. The cluster analysis has been conducted on the differences between the treatment grain yield and the average control grain yield. For this analysis, the NPK and control yields were averaged across the different replicates within a field.

In the clustering analysis there is no definitive number of clusters to consider. For each analysis, 2 to 15 clusters were used, plotted and the amount of variation explained by the successive number of clusters used as a basis to determine the appropriate number of clusters to consider. i.e., to select an appropriate number of clusters we fitted the cluster analysis with a range of clusters and calculated the amount of variation explained by each number of clusters. Lime was omitted from the k-means clustering because it had not been applied in two of the sentinel sites (Mbinga and Pampaida, i.e, including lime would drastically reduce the number of fields included in the analysis). Figure 15 shows the level of variation that can be explained from using various number of clusters for the diagnostic trials data. The trend in the plot shows that the extra variation explained by each additional cluster tends to decrease as the number of clusters increases. 


Figure 15. Percentage of variance explained at different number of clusters for maize grain yield in tested AfSIS sites using addition treatments


Based on the above plot, we selected 4 clusters since the extra amount of variation explained by an extra cluster is minimal.




Plots of the generated clusters against the variables included in the analysis were used to interpret the different clusters (Figure 16). The resulting clusters can be interpreted as follows:


  • Cluster 1: fields that are highly responsive to N and have some response to multinutrients and amendments

  • Cluster 2: fields that have only low response to fertilizer application. None of the 3 macronutrients appears as the key limiting nutrient. It can be said that all three N,P and K nutrients are required in combinations. May represent generally depleted soils that have also structural problems.  

  • Cluster 3: fields that respond highly to N but also to P and where manure improve yields highly. Here, combination of N and P is key. Manure is also beneficial.

  • Cluster 4: Non-responsive fields to any form of management.


Figure 17 demonstrates that a grouping of the fields according to their response patterns is possible using clustering algorithms. It can draw practical application where it is needed to find out occurrence of fields with unique response patterns. 


Figure 16. Plots of the resulting 4 clusters from the analysis of the diagnostic trial yield data. Treatment applied with Lime is omitted since it was not applied in 2 sentinel sites


The R-code used for this analysis including descriptions of each step in the code is “Yield Cluster Analysis_10_12.R” . 



Figure 17. Yield observed from fields classified under different clusters following K-Means clustering