Rough Noise-Filtered Easy Ensemble for Software Fault Prediction
Rough Noise-Filtered Easy Ensemble for Software Fault Prediction
Blog Article
Software fault prediction is the very important research topic for software quality assurance.Data-driven approaches provide robust mechanisms to deal with software fault prediction.However, the prediction performance of the model highly depends on the quality of the data set.
Many software data sets suffer from the problem of class imbalance.In this regard, undersampling is a popular data pre-processing method in dealing with the class imbalance problem; easy ensemble presents a robust approach to achieve a high classification rate and address the biases toward majority class samples.However, imbalance class is not the only issue that harms the performance of classifiers.
Some noisy data and irrelevant and redundant features may also reduce the performance of predictive accuracy of the classifier.In this hp 15-ef1005ds paper, we propose two-stage data pre-processing, which incorporates feature selection and rough set-based K nearest neighbour rule (KNN) noise filter afore executing easy ensemble rough-KNN noise-filtered easy ensemble redken 07m (RKEE).In the first stage, we eliminate the irrelevant and redundant features by the feature ranking algorithm, and in the second stage, we handle the imbalance class problem by using rough-KNN noise filter to eliminate noisy samples from both the minority and the majority class and also handle the uncertainty and the overlapping problem from both the minority and the majority class.
Experimental evaluation on real-world software projects, such as NASA and Eclipse data set, is performed in order to demonstrate the effectiveness of our proposed approach.Furthermore, this paper comprehensively investigates the influencing factor in our approach, such as the impact of the rough set theory on noise-filter, the relationship between model performance and imbalance ratio, and so on.Comprehensive experiments indicate that the proposed approach shows outstanding performance with significance in terms of area-under-the-curve.