Imbalanced Graph Node Classification

Teimuri Jervakani, Mohammad Taha; Rabiee, Hamid Reza Rohban, Mohammad Hossein

Please enable javascript in your browser.

Imbalanced Graph Node Classification

Teimuri Jervakani, Mohammad Taha | 2024

0 Viewed

Type of Document: M.Sc. Thesis
Language: Farsi
Document No: 57571 (19)
University: Sharif University of Technology
Department: Computer Engineering
Advisor(s): Rabiee, Hamid Reza; Rohban, Mohammad Hossein
Abstract:
One of the major challenges in artificial intelligence is the presence of imbalanced data. Imbalanced data occurs when the number of samples in some classes is significantly lower than in others. This imbalance can lead to bias in machine learning models, as models tend to learn better and more accurately from classes with more samples. As a result, they may perform poorly when classifying samples from minority classes. This issue becomes particularly important when minority classes play a critical role in sensitive applications such as healthcare or security. In these cases, it is essential to pay close and fair attention to the minority classes to avoid unjust outcomes. In recent years, imbalanced data in graphs has become an active area of research. Graphs are complex structures that represent relationships between nodes and are used in many applications, including social networks, chemistry, and biology. Node classification in imbalanced graphs is one of the key challenges in this field. Most studies in this area have focused on generating more samples for minority classes or engineering cost functions to reduce the effect of imbalanced data. While these methods have provided improvements in the classification of minority nodes, they still have limitations. In this project, two new algorithms, named KLCE and UTPR, have been introduced to explore novel approaches for solving the problem of imbalanced data in graphs. These algorithms are related to regularization and pseudo-labeling, respectively. The KLCE algorithm is designed to give more attention to minority classes in imbalanced data. This algorithm combines KL divergence and cross-entropy to optimize the cost function more accurately, encouraging the model to learn better in minority classes. Additionally, a coefficient named λKLCE is introduced, allowing the optimization point of the function to be adjusted in a targeted manner, thus improving the learning of minority samples. The results obtained from this regularizer show that, in many of the datasets tested, the F1 score has increased by 1 to 2 percent, indicating improved performance in classifying minority samples. The UTPR algorithm, also introduced in this study, is based on pseudo-labeling and self-training. It uses uncertainty to filter out noisy or inappropriate samples. This method has led to a significant improvement in model performance. For instance, in the CiteSeer dataset, the use of the UTPR algorithm resulted in an 18% improvement in the F1 score compared to previous methods. Finally, to rigorously evaluate these methods, eight reputable datasets were used, which helped to thoroughly assess the proposed methods and demonstrated their effectiveness under various conditions
Keywords:
Imbalanced Data ; Oversampling ; Regularizer ; Pseudo-Labeling ; Regularization ; Graph Classification

Digital Object List

محتواي کتاب
view

Bookmark

Friend's email
Your name
Your email
enter code