The monitoring and quantification of soil carbon provide a better understanding of soil and atmosphere dynamics. Visible-near-infrared-short-wave infrared (VIS-NIR-SWIR) reflectance spectroscopy can quantitatively estimate soil carbon contentmore rapidly and cost-effectively compared to traditional laboratory analysis. However, effective estimation of soil carbon using reflectance spectroscopy to a great extent depends on the selection of a suitable preprocessing sequence and data-mining algorithm. Many efforts have been dedicated to the comparison of conventional chemometric techniques and their optimization for soil properties prediction. Instead, the current study focuses on the potential of the new data-mining engine PARACUDA-II®, recently developed at Tel-Aviv University (TAU), by comparing its performance in predicting soil oxidizable carbon (Cox) against common data-mining algorithms including partial least squares regression (PLSR), random forests (RF), boosted regression trees (BRT), support vector machine regression (SVMR), and memory based learning (MBL). To this end, 103 soil samples from the Pokrok dumpsite in the Czech Republic were scanned with an ASD FieldSpec III Pro FR spectroradiometer in the laboratory under a strict protocol. Spectra preprocessing for conventional data-mining techniques was conducted using Savitzky-Golay smoothing and the first derivative method. PARACUDA-II®, on the other hand, operates based on the all possibilities approach (APA) concept, a conditional Latin hypercube sampling (cLHs) algorithm and parallel programming, to evaluate all of the potential combinations of eight different spectral preprocessing techniques against the original reflectance and chemical data prior to the model development. The comparison of results was made in terms of the coefficient of determination (R2) and root-mean-square error of prediction (RMSEp). Results showed that the PARACUDA-II® engine performed better than the other selected regular schemes with R2 value of 0.80 and RMSEp of 0.12; the PLSR was less predictive compared to other techniques with R2Ep = 0.63 and RMSEp = 0.29. This can be attributed to its capability to assess all the available options in an automatic way, which enables the hidden models to rise up and yield the best available model.
ASJC Scopus subject areas