Cross-validation (statistics)

Cross-validation, sometimes called rotation estimation or out-of-sample testing, is any of various similar model validation techniques for assessing how the results of a statistical analysis will generalize to an independent data set.Cross-validation is a resampling method that uses different portions of the data to test and train a model on different iterations. It is mainly used in settings where the goal is prediction, and one wants to estimate how accurately a predictive model will perform in practice. In a prediction problem, a model is usually given a dataset of known data on which training is run (training dataset), and a dataset of unknown data (or first seen data) against which the model is tested (called the validation dataset or testing set). The goal of cross-validation is to t

Comment: enCross-validation, sometimes called rotation estimation or out-of-sample testing, is any of various similar model validation techniques for assessing how the results of a statistical analysis will generalize to an independent data set.Cross-validation is a resampling method that uses different portions of the data to test and train a model on different iterations. It is mainly used in settings where the goal is prediction, and one wants to estimate how accurately a predictive model will perform in practice. In a prediction problem, a model is usually given a dataset of known data on which training is run (training dataset), and a dataset of unknown data (or first seen data) against which the model is tested (called the validation dataset or testing set). The goal of cross-validation is to t
Date: enJune 2022
Depiction
Has abstract: enCross-validation, sometimes called rotation estimation or out-of-sample testing, is any of various similar model validation techniques for assessing how the results of a statistical analysis will generalize to an independent data set.Cross-validation is a resampling method that uses different portions of the data to test and train a model on different iterations. It is mainly used in settings where the goal is prediction, and one wants to estimate how accurately a predictive model will perform in practice. In a prediction problem, a model is usually given a dataset of known data on which training is run (training dataset), and a dataset of unknown data (or first seen data) against which the model is tested (called the validation dataset or testing set). The goal of cross-validation is to test the model's ability to predict new data that was not used in estimating it, in order to flag problems like overfitting or selection bias and to give an insight on how the model will generalize to an independent dataset (i.e., an unknown dataset, for instance from a real problem). One round of cross-validation involves partitioning a sample of data into complementary subsets, performing the analysis on one subset (called the training set), and validating the analysis on the other subset (called the validation set or testing set). To reduce variability, in most methods multiple rounds of cross-validation are performed using different partitions, and the validation results are combined (e.g. averaged) over the rounds to give an estimate of the model's predictive performance. In summary, cross-validation combines (averages) measures of fitness in prediction to derive a more accurate estimate of model prediction performance.
Hypernym: Technique
Is primary topic of: Cross-validation (statistics)
Label: enCross-validation (statistics)
Link from a Wikipage to another Wikipage: Accuracy; Bayesian regression; Bias (statistics); Binary classification; Binomial coefficient; Boosting (machine learning); Bootstrap aggregating; Bootstrapping (statistics); Cancer; Category:Machine learning; Category:Model selection; Category:Regression variable selection; Closed-form expression; Complement (set theory); Confidence interval; Confirmation bias; Data; Dichotomous; Drug; Euclidean vector; Expected value; Features (pattern recognition); Feature selection; File:Confusion matrix.png; File:K-fold cross validation EN.svg; File:KfoldCV.gif; File:LOOCV.gif; Gene expression; Generalization error; Goodness of fit; Hyperparameter (machine learning); Hyperplane; Independence (probability theory); Jackknife resampling; Kernel regression; K nearest neighbors; Lasso (statistics); Leakage (machine learning); Least squares; Linear regression; Logistic regression; Loss function; Mean squared error; Median absolute deviation; Medical diagnosis; Model selection; Model validation; Monte Carlo method; Optical character recognition; Optimization (mathematics); Out-of-bag error; Overfitting; Parameters; Partition of a set; Positive predictive value; Predictive modelling; PRESS statistic; Proteins; RANSAC; Real number; Regularization (mathematics); Resampling (statistics); Ridge regression; ROC curve; Root mean squared error; Selection bias; Sherman–Morrison formula; Shrinkage estimator; Stability (learning theory); Stationary bootstrap; Statistical model; Statistical population; Statistical sample; Statistics; Stock market prediction; Summary statistics; Support Vector Machine; Time-series; Training, validation, and test sets; Validation set; Validity (statistics); Variance
Reason: enTrippa et al. does not contain any proof or discussion of linear parametric models generating a downward bias by a factor of / in the expected MSE.
SameAs: 4jr8u; Balidazio gurutzatu; Çapraz doğrulama (istatistik); Convalida incrociata; Cross-validation (statistics); Kiểm chứng chéo; Korsvalidering; Kreuzvalidierungsverfahren; Křížová validace; m.025t5x; Q541014; Ristvalideerimine; Sprawdzian krzyżowy; Validação cruzada; Validació encreuada; Validación cruzada; Validasi silang; Validasi-silang; Validation croisée; Перекрёстная проверка; Перехресне затверджування; تصديق متقاطع; روش اعتبارسنجی متقابل; 交叉驗證; 交差検証; 교차타당도
Subject: Category:Machine learning; Category:Model selection; Category:Regression variable selection
Thumbnail
WasDerivedFrom: Cross-validation (statistics)?oldid=1123515585&ns=0
WikiPageLength: 41504
Wikipage page ID: 416612
Wikipage revision ID: 1123515585
WikiPageUsesTemplate: Template:Citation needed; Template:Commons category; Template:Div col; Template:Div col end; Template:Irrelevant citation; Template:More citations needed; Template:Reflist; Template:Short description; Template:Statistics

Cross-validation (statistics)

Backlinks

About

Resources

Support

Follow us