What is the difference between a training, validation and test set?

posted Aug 31, 2012, 9:47 AM by Unknown user   [ updated Aug 31, 2012, 9:50 AM ]
Training set: used to build/train the supervised learning model. We've been studying trained models which are classifiers.
Validation set: used to initially evaluate the performance of the model (confusion matrix, error rate etc) during the tuning stage. For decision trees, tuning could mean pre or post pruning. Pre-pruning tuning could mean experimenting with min split, max tree depth etc. Post-pruning is based on pruning for a particular complexity parameter.
Test set: once the model has been tuned, then produce final performance evaluation (confusion matrix/error rate) using the test set.