45347 |
Creator |
9ac1d268bb57f50a76301a873fb56d23 |
45347 |
Creator |
ext-c1dc802d361ac3ee086d4e4304ee7069 |
45347 |
Date |
2013-10-09 |
45347 |
Is Part Of |
repository |
45347 |
abstract |
Background: In our previous research, we built defect prediction models by using confirmation
bias metrics. Due to confirmation bias developers tend to perform unit tests to make
their programs run rather than breaking their code. This, in turn, leads to an increase
in defect density. The performance of prediction model that is built using confirmation
bias was as good as the models that were built with static code or churn metrics.
Aims: Collection of confirmation bias metrics may result in partially "missing data"
due to developers' tight schedules, evaluation apprehension and lack of motivation
as well as staff turnover. In this paper, we employ Expectation-Maximization (EM)
algorithm to impute missing confirmation bias data.
Method: We used four datasets from two large-scale companies. For each dataset, we
generated all possible missing data configurations and then employed Roweis' EM algorithm
to impute missing data. We built defect prediction models using the imputed data.
We compared the performances of our proposed models with the ones that used complete
data.
Results: In all datasets, when missing data percentage is less than or equal to 50%
on average, our proposed model that used imputed data yielded performance results
that are comparable with the performance results of the models that used complete
data.
Conclusions: We may encounter the "missing data" problem in building defect prediction
models. Our results in this study showed that instead of discarding missing or noisy
data, in our case confirmation bias metrics, we can use effective techniques such
as EM based imputation to overcome this problem. |
45347 |
authorList |
authors |
45347 |
presentedAt |
ext-b9cec4b52e0e1f04fa6b29d54624c260 |
45347 |
status |
peerReviewed |
45347 |
uri |
http://data.open.ac.uk/oro/document/404594 |
45347 |
uri |
http://data.open.ac.uk/oro/document/404599 |
45347 |
uri |
http://data.open.ac.uk/oro/document/404604 |
45347 |
uri |
http://data.open.ac.uk/oro/document/404605 |
45347 |
uri |
http://data.open.ac.uk/oro/document/404606 |
45347 |
uri |
http://data.open.ac.uk/oro/document/404607 |
45347 |
uri |
http://data.open.ac.uk/oro/document/407832 |
45347 |
type |
AcademicArticle |
45347 |
type |
Article |
45347 |
label |
Calikli, Gul and Bener, Ayse (2013). An Algorithmic Approach to Missing Data Problem
in Modeling Human Aspects in Software Development. In: PROMISE '13: 9th International
Conference on Predictive Models in Software Engineering, ACM, New York, USA, article
no. 10. |
45347 |
label |
Calikli, Gul and Bener, Ayse (2013). An Algorithmic Approach to Missing Data
Problem in Modeling Human Aspects in Software Development. In: PROMISE '13: 9th
International Conference on Predictive Models in Software Engineering, ACM, New York,
USA, article no. 10. |
45347 |
Publisher |
ext-2af1883e4bbfa0356fcedb366171cb38 |
45347 |
Title |
An Algorithmic Approach to Missing Data Problem in Modeling Human Aspects in Software
Development |
45347 |
in dataset |
oro |