There is an unprecedented amount of data available. This has caused knowledge discovery to garner attention in recent years. However, many real-world datasets are imbalanced. Learning from imbalanced data poses major challenges and is recognized as needing significant attention.

The problem with imbalanced data is the performance of learning algorithms in the presence of underrepresented data and severely skewed class distributions. Models trained on imbalanced datasets strongly favor the majority class and largely ignore the minority class. Several approaches introduced to date present both data-based and algorithmic solutions.

The specific goals of this course are:

  • Help the students understand the underline causes of this problem

  • Discuss the different characteristics of an unbalanced dataset

  • Highlight the severity and importance  of this branch of data science

  • Give a general idea of the two main major state-of-the-art approaches that you developed to handle this problem.

  • Go over two methods in details to give an idea about some of the techniques used and hopefully motivate the students to learn more.($10)