What would be your strategy to handle a situation indicating an imbalanced dataset?

What would be your strategy to handle a situation indicating an imbalanced dataset?

Add Comment


  • 1 Answer(s)
    There are a few ways you can deal with imbalanced datasets. Undersampling involves removal of some of data your majority class to result in a balanced distribution of all classes. However if you have a small dataset size this may not be ideal! There is also a method of oversampling where you increase the numbers of your minority class (by duplication of the data of the minority class – your under-represented class) However this method is not ideal as you are duplicating data so may effect your models.
    Additionally if you have a small dataset and do not want to remove data then maybe the application of misclassification costs to your SVM models. Basically you apply higher misclassfication costs to your minority class and should essentially balance out your classes. In other words your model is better at classifying data that it has more of so if you apply a higher misclassfication cost to the minority class it will avoid misclassifying your minority class due to higher cost you have applied to it. As models want to perform with the least cost so if you apply higher costs to classes you want to improve the prediction of it will avoid this misclassification.
    I am pretty sure you can apply this to SVM models however the main bulk of my research is using these methods using decision trees but even so it may help you with further ideas! You can compare the methods with and without misclassfication costs for example and see if SVM can cope with the imbalanced data.
    Please check out my publications on my page for more information as these go into more detail regarding the imbalanced data problem.
    Answered on March 5, 2019.
    Add Comment


  • Your Answer

    By posting your answer, you agree to the privacy policy and terms of service.