Application of clustering approaches on yields
Yield data, from a total of 200+ onfarm diagnostic trial fields, was clustered using KMeans clustering. Only sites with maize were included in the cluster analysis as there is no way of accounting for crop type in the clustering. The cluster analysis has been conducted on the differences between the treatment grain yield and the average control grain yield. For this analysis, the NPK and control yields were averaged across the different replicates within a field.
In the clustering analysis there is no definitive number of clusters to consider. For each analysis, 2 to 15 clusters were used, plotted and the amount of variation explained by the successive number of clusters used as a basis to determine the appropriate number of clusters to consider. i.e., to select an appropriate number of clusters we fitted the cluster analysis with a range of clusters and calculated the amount of variation explained by each number of clusters. Lime was omitted from the kmeans clustering because it had not been applied in two of the sentinel sites (Mbinga and Pampaida, i.e, including lime would drastically reduce the number of fields included in the analysis). Figure 15 shows the level of variation that can be explained from using various number of clusters for the diagnostic trials data. The trend in the plot shows that the extra variation explained by each additional cluster tends to decrease as the number of clusters increases.
Figure 15. Percentage of variance explained at different number of clusters for maize grain yield in tested AfSIS sites using addition treatments
Based on the above plot, we selected 4 clusters since the extra amount of variation explained by an extra cluster is minimal.
omis.kmeans<kmeans(unadj.omis[,9:14],centers=4)
Plots of the generated clusters against the variables included in the analysis were used to interpret the different clusters (Figure 16). The resulting clusters can be interpreted as follows:

Cluster 1: fields that are highly responsive to N and have some response to multinutrients and amendments

Cluster 2: fields that have only low response to fertilizer application. None of the 3 macronutrients appears as the key limiting nutrient. It can be said that all three N,P and K nutrients are required in combinations. May represent generally depleted soils that have also structural problems.

Cluster 3: fields that respond highly to N but also to P and where manure improve yields highly. Here, combination of N and P is key. Manure is also beneficial.

Cluster 4: Nonresponsive fields to any form of management.
Figure 17 demonstrates that a grouping of the fields according to their response patterns is possible using clustering algorithms. It can draw practical application where it is needed to find out occurrence of fields with unique response patterns.
Figure 16. Plots of the resulting 4 clusters from the analysis of the diagnostic trial yield data. Treatment applied with Lime is omitted since it was not applied in 2 sentinel sites
The Rcode used for this analysis including descriptions of each step in the code is “Yield Cluster Analysis_10_12.R” .
Figure 17. Yield observed from fields classified under different clusters following KMeans clustering