A Selective Under-Sampling (SUS) Method For Imbalanced Regression

作者

  • Jovana Aleksic 作者

DOI:

https://doi.org/10.63944/p0rbmg07

摘要

Many mainstream machine learning approaches, such as neural networks, are not well suited to work with imbalanced data. Yet, this problem is frequently present in many real-world data sets. Collection methods are imperfect, and often not able to capture enough data in a specific range of the target variable. Furthermore, in certain tasks data is inherently imbalanced with many more normal events than edge cases. This problem is well studied within the classification context. However, only several methods have been proposed to deal with regression tasks. In addition, the proposed methods often do not yield good performance with high-dimensional data, while imbalanced high-dimensional regression has scarcely been explored. In this paper we present a selective under-sampling (SUS) algorithm for dealing with imbalanced regression and its iterative version SUSiter. We assessed this method on 15 regression data sets from different imbalanced domains, 5 synthetic high-dimensional imbalanced data sets and 2 more complex imbalanced age estimation image data sets. Our results suggest that SUS and SUSiter typically outperform other state-of-the-art techniques like SMOGN, or random under-sampling, when used with neural networks as learners.

参考文献

Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., & Zagoruyko, S. (2020).

End-to-end object detection with transformers. ArXiv, abs/2005.12872.

Castro, C. L., & Braga, A. P. (2013). Novel cost-sensitive approach to improve the multilayer

perceptron performance on imbalanced data. IEEE Transactions on Neural Networks

and Learning Systems, 24 (6), 888–899.

Chawla, N. V., Bowyer, K. W., Hall, L. O., & Kegelmeyer, W. P. (2002). Smote: synthetic

minority over-sampling technique. Journal of artificial intelligence research, 16, 321–

357.

Cover, T., & Hart, P. (1967). Nearest neighbor pattern classification. IEEE Transactions

on Information Theory, 13 (1), 21–27.

Cui, Y., Jia, M., Lin, T.-Y., Song, Y., & Belongie, S. (2019). Class-balanced loss based on

effective number of samples. In Proceedings of the IEEE/CVF conference on computer

vision and pattern recognition, pp. 9268–9277.

Fern´andez, A., Garca, S., Galar, M., Prati, R. C., Krawczyk, B., & Herrera, F. (2018).

Learning from Imbalanced Data Sets (1st edition). Springer Publishing Company,

Incorporated.

Goh, G., Cammarata, N., Voss, C., Carter, S., Petrov, M., Schubert, L., Radford, A., &

Olah, C. (2021). Multimodal neurons in artificial neural networks. Distill, 6 (3), e30.

Goodfellow, I., Bengio, Y., & Courville, A. (2016). Deep Learning. MIT Press. http:

//www.deeplearningbook.org

下载量

已出版

2025-08-01

期次

栏目

文章