How Efficient Is Estimation with Missing Data? |
Seliz G. Karadogan, Letizia Marchegiani, Lars Kai Hansen, Jan Larsen
|
Abstract | In this paper, we represent a new evaluation approach for missing
data techniques (MDTs) where the efficiency of those are investigated
using listwise deletion method as reference. We experiment
on classification problems and calculate misclassification rates (MR)
for different missing data percentages (MDP). We compare three
MDTs: pairwise deletion (PW), mean imputation (MI) and a maximum
likelihood method that we call complete expectation maximization
(CEM). We use synthetic dataset, Iris dataset and Pima Indians
Diabetes dataset. We train a Gaussian mixture model (GMM)
with missing at random (MAR) data. We test the trained GMM for
two cases, in which test dataset is missing or complete. The results
show that CEM is the most efficient method in both cases while MI
is the worst of the three. PW and CEM prove to be more stable with
respect to especially higher MDP values than MI. |
Type | Conference paper [With referee] |
Conference | International Conference on Acoustics, Speech and Signal Processing |
Year | 2011 Month May pp. 2260-2263 |
Publisher | IEEE Press |
ISBN / ISSN | DOI 10.1109/ICASSP.2011.5946932 |
Electronic version(s) | [pdf] |
Publication link | http://ieeexplore.ieee.org/search/srchabstract.jsp?tp=&arnumber=5946932&openedRefinements%3D*%26filter%3DAND%28NOT%284283010803%29%29%26searchField%3DSearch+All%26queryText%3DJ.+Larsen+2011 |
BibTeX data | [bibtex] |
IMM Group(s) | Intelligent Signal Processing |