@ARTICLE\{IMM2013-06776,
    author       = "M. Vega and S. Sharifzadeh and D. Wulfsohn and T. Skov and L. Clemmensen and T. Toldam-Andersen",
    title        = "A sampling approach for predicting the eating quality of apples using visible–near infrared spectroscopy",
    year         = "2013",
    keywords     = "Malus domestica, {SSC,} Representative sample, Training set formation, Variability",
    pages        = "3710–3719",
    journal      = "Journal of the Science of Food and Agriculture",
    volume       = "93",
    editor       = "",
    number       = "15",
    publisher    = "",
    url          = "http://www2.compute.dtu.dk/pubdb/pubs/6776-full.html",
    abstract     = "{BACKGROUND} 
Visible–near infrared spectroscopy remains a method of increasing interest as a fast alternative for the evaluation of fruit quality. The success of the method is assumed to be achieved by using large sets of samples to produce robust calibration models. In this study we used representative samples of an early and a late season apple cultivar to evaluate model robustness (in terms of prediction ability and error) on the soluble solids content (SSC) and acidity prediction, in the wavelength range 400–1100\&\#8201;nm.

{RESULTS} 
A total of 196 middle–early season and 219 late season apples (Malus domestica Borkh.) cvs ‘Aroma’ and ‘Holsteiner Cox’ samples were used to construct spectral models for {SSC} and acidity. Partial least squares (PLS), ridge regression (RR) and elastic net (EN) models were used to build prediction models. Furthermore, we compared three sub-sample arrangements for forming training and test sets (‘smooth fractionator’, by date of measurement after harvest and random). Using the ‘smooth fractionator’ sampling method, fewer spectral bands (26) and elastic net resulted in improved performance for {SSC} models of ‘Aroma’ apples, with a coefficient of variation {CVSSC} = 13\%. The model showed consistently low errors and bias (PLS/EN: R2cal\&\#8201;=\&\#8201;0.60/0.60; {SEC} = 0.88/0.88°Brix; Biascal\&\#8201;=\&\#8201;0.00/0.00; R2val\&\#8201;=\&\#8201;0.33/0.44; {SEP} = 1.14/1.03; Biasval\&\#8201;=\&\#8201;0.04/0.03). However, the prediction acidity and for {SSC} ({CV} = 5\%) of the late cultivar ‘Holsteiner Cox’ produced inferior results as compared with ‘Aroma’.

{CONCLUSION} 
It was possible to construct local {SSC} and acidity calibration models for early season apple cultivars with CVs of {SSC} and acidity around 10\%. The overall model performance of these data sets also depend on the proper selection of training and test sets. The ‘smooth fractionator’ protocol provided an objective method for obtaining training and test sets that capture the existing variability of the fruit samples for construction of visible–{NIR} prediction models. The implication is that by using such ‘efficient’ sampling methods for obtaining an initial sample of fruit that represents the variability of the population and for sub-sampling to form training and test sets it should be possible to use relatively small sample sizes to develop spectral predictions of fruit quality. Using feature selection and elastic net appears to improve the {SSC} model performance in terms of R2, {RMSECV} and {RMSEP} for ‘Aroma’ apples. © 2013 Society of Chemical Industry",
    isbn_issn    = "0022-5142"
}