Cross-validation (statistics)

Cross-validation (statistics)

Cross-validation, sometimes called rotation estimation or out-of-sample testing, is any of various similar model validation techniques for assessing how the results of a statistical analysis will generalize to an independent data set.Cross-validation is a resampling method that uses different portions of the data to test and train a model on different iterations. It is mainly used in settings where the goal is prediction, and one wants to estimate how accurately a predictive model will perform in practice. In a prediction problem, a model is usually given a dataset of known data on which training is run (training dataset), and a dataset of unknown data (or first seen data) against which the model is tested (called the validation dataset or testing set). The goal of cross-validation is to t

Comment
enCross-validation, sometimes called rotation estimation or out-of-sample testing, is any of various similar model validation techniques for assessing how the results of a statistical analysis will generalize to an independent data set.Cross-validation is a resampling method that uses different portions of the data to test and train a model on different iterations. It is mainly used in settings where the goal is prediction, and one wants to estimate how accurately a predictive model will perform in practice. In a prediction problem, a model is usually given a dataset of known data on which training is run (training dataset), and a dataset of unknown data (or first seen data) against which the model is tested (called the validation dataset or testing set). The goal of cross-validation is to t
Date
enJune 2022
Depiction
Confusion matrix.png
K-fold cross validation EN.svg
KfoldCV.gif
LOOCV.gif
Has abstract
enCross-validation, sometimes called rotation estimation or out-of-sample testing, is any of various similar model validation techniques for assessing how the results of a statistical analysis will generalize to an independent data set.Cross-validation is a resampling method that uses different portions of the data to test and train a model on different iterations. It is mainly used in settings where the goal is prediction, and one wants to estimate how accurately a predictive model will perform in practice. In a prediction problem, a model is usually given a dataset of known data on which training is run (training dataset), and a dataset of unknown data (or first seen data) against which the model is tested (called the validation dataset or testing set). The goal of cross-validation is to test the model's ability to predict new data that was not used in estimating it, in order to flag problems like overfitting or selection bias and to give an insight on how the model will generalize to an independent dataset (i.e., an unknown dataset, for instance from a real problem). One round of cross-validation involves partitioning a sample of data into complementary subsets, performing the analysis on one subset (called the training set), and validating the analysis on the other subset (called the validation set or testing set). To reduce variability, in most methods multiple rounds of cross-validation are performed using different partitions, and the validation results are combined (e.g. averaged) over the rounds to give an estimate of the model's predictive performance. In summary, cross-validation combines (averages) measures of fitness in prediction to derive a more accurate estimate of model prediction performance.
Hypernym
Technique
Is primary topic of
Cross-validation (statistics)
Label
enCross-validation (statistics)
Link from a Wikipage to another Wikipage
Accuracy
Bayesian regression
Bias (statistics)
Binary classification
Binomial coefficient
Boosting (machine learning)
Bootstrap aggregating
Bootstrapping (statistics)
Cancer
Category:Machine learning
Category:Model selection
Category:Regression variable selection
Closed-form expression
Complement (set theory)
Confidence interval
Confirmation bias
Data
Dichotomous
Drug
Euclidean vector
Expected value
Features (pattern recognition)
Feature selection
File:Confusion matrix.png
File:K-fold cross validation EN.svg
File:KfoldCV.gif
File:LOOCV.gif
Gene expression
Generalization error
Goodness of fit
Hyperparameter (machine learning)
Hyperplane
Independence (probability theory)
Jackknife resampling
Kernel regression
K nearest neighbors
Lasso (statistics)
Leakage (machine learning)
Least squares
Linear regression
Logistic regression
Loss function
Mean squared error
Median absolute deviation
Medical diagnosis
Model selection
Model validation
Monte Carlo method
Optical character recognition
Optimization (mathematics)
Out-of-bag error
Overfitting
Parameters
Partition of a set
Positive predictive value
Predictive modelling
PRESS statistic
Proteins
RANSAC
Real number
Regularization (mathematics)
Resampling (statistics)
Ridge regression
ROC curve
Root mean squared error
Selection bias
Sherman–Morrison formula
Shrinkage estimator
Stability (learning theory)
Stationary bootstrap
Statistical model
Statistical population
Statistical sample
Statistics
Stock market prediction
Summary statistics
Support Vector Machine
Time-series
Training, validation, and test sets
Validation set
Validity (statistics)
Variance
Reason
enTrippa et al. does not contain any proof or discussion of linear parametric models generating a downward bias by a factor of / in the expected MSE.
SameAs
4jr8u
Balidazio gurutzatu
Çapraz doğrulama (istatistik)
Convalida incrociata
Cross-validation (statistics)
Kiểm chứng chéo
Korsvalidering
Kreuzvalidierungsverfahren
Křížová validace
m.025t5x
Q541014
Ristvalideerimine
Sprawdzian krzyżowy
Validação cruzada
Validació encreuada
Validación cruzada
Validasi silang
Validasi-silang
Validation croisée
Перекрёстная проверка
Перехресне затверджування
تصديق متقاطع
روش اعتبارسنجی متقابل
交叉驗證
交差検証
교차타당도
Subject
Category:Machine learning
Category:Model selection
Category:Regression variable selection
Thumbnail
Confusion matrix.png?width=300
WasDerivedFrom
Cross-validation (statistics)?oldid=1123515585&ns=0
WikiPageLength
41504
Wikipage page ID
416612
Wikipage revision ID
1123515585
WikiPageUsesTemplate
Template:Citation needed
Template:Commons category
Template:Div col
Template:Div col end
Template:Irrelevant citation
Template:More citations needed
Template:Reflist
Template:Short description
Template:Statistics