Income is a key variable in many choice models. It is also one of the most salient examples of a variable affected by data problems. Issues with income arise as measurement errors in categorically captured income, correlation between stated income and unobserved variables, systematic over- or under-statement of income and missing income values for those who refuse to answer or do not know their (household) income. A common approach for dealing especially with missing income is to use imputation based on the relationship among those who report income between their stated income for reporters and their socio-demographic characteristics. A number of authors have also recently put forward a latent variable treatment of the issue, which has theoretical advantages over imputation, not least by drawing not just on data on stated income for reporters, but also choice behaviour of all respondents. We contrast this approach empirically with imputation as well as simpler approaches in two case studies, one with stated preference data and one with revealed preference data. Our findings suggest that, at least with the data at hand, the latent variable approach produces similar results to imputation, possibly an indication of non-reporters of income having similar income distributions from those who report it. But in other data sets the efficiency advantage over imputation could help in revealing issues in the complete and accurate reporting of income.
Hess, S., Sanko, N., Dumont, J. & Daly, A.J. (2014), Contrasting imputation with a latent variable approach to dealing with missing income in choice models. Journal of Choice Modelling, 12, pp 47-57.