Empirical Software Engineering


The development of large and complex software systems is a huge challenge and activities to support software development and project management processes using data mining are an important area of research.

A first key task in Empirical Software Engineering is the estimation of the effort needed to develop new software projects. Comprehensible and accurate software cost estimation is of crucial importance to software project management. It affects all major management decisions such as start scheduling, resource allocation and risk management. Since the early 60’s, extensive studies have been undertaken to determine the most suitable method to perform this estimation task. One of the earliest attempts in this direction was expert judgment, soon followed by a number of other parametric alternatives such as the COCOMO model. More recently, the task of software effort estimation has been investigated from a data mining perspective using state-of-the-art data mining techniques such as Neural Networks, Support Vector Machines and Classification And Regression Trees (CART). Different studies have indicated the lack of proper software effort estimation by software development companies (see e.g. the bi-yearly CHAOS reports) hereby failing to provide an accurate cost/time estimate of the project to be developed. The potentially high costs incurred by these misestimations is causing the software development sector to become a high risk sector. Hence, it should not be any surprise that more and more software development standards are imposed upon the developing companies like the CMM model (Capability Maturity Model) and ISO 15504 (better known as SPICE).

Another interesting research track within the field of empirical software engineering is software fault prediction. Software fault prediction strives to improve software quality and testing efficiency by constructing predictive classification models from code attributes to enable a timely identification of fault-prone modules. Typically, advanced yet understandable data mining techniques can be used to derive suitable defect prediction models, which will aid in both the development as well as the testing phase.

The research team of professor Baesens currently studies the following topics in this context:

  • Investigating the applicability of data mining techniques to develop powerful and interpretable software effort and software defect prediction models
  • The investigation into the comprehensibility of various state of the art data mining techniques in the context of both software effort prediction and fault detection
  • Assessment of the applicability of network based learning to the field of empirical software engineering
  • The inclusion of domain knowledge in software effort prediction models

Notable Publications

Journal Publications

  • Lessmann S., Baesens B., Mues C., Pietsch S., Benchmarking classification models for software defect prediction: A proposed framework and novel findings, IEEE Transactions on Software Engineering, vol. 34, number 4, pp. 485-496, 2008
  • Vandecruys O., Martens D., Baesens B., Mues C., De Backer M., Haesen R., Software Repositories for Comprehensible Software Fault Prediction Models, Journal of Systems and Software, vol. 81, pp. 823-839, 2008
  • Dejaeger K., Verbeke W., Martens D., Baesens B., Data mining techniques for software effort estimation: a comparative study, IEEE Transactions on Software Engineering, Forthcoming 2011

Dutch Journal Publications

  • Dejaeger K., Verbeke W., Martens D., Baesens B., De kosten van software-ontwikkeling voorspellen,Informatie, vol. 32, number 9, 2010

Conference Publications

  • Baesens B., Setiono R., Mues C., Neural Network Rule Extraction and Decision Tables for Software Fault Prediction, Proceedings of the Fourteenth International Conference on Neural Information Processing (ICONIP 2007), Special session on “Innovation in Machine Learn, Kitakyushu, Japan, 2007
  • Setiono R., Dejaeger K., Verbeke W., Martens D., Baesens B., Software Effort Prediction using Regression Rule Extraction from Neural Networks, Proceedings of the 22nd International Conference on Tools with Artificial Intelligence ICTAI 2010 conference, Arras, France, October 2010
  • Baojun M., Dejaeger K., Vanthienen J., Baesens B., Software defect prediction based on association rule classification, International Conference on Electronic-Business Intelligence , Kunming, China, pp. 396-402, December 2010