Record Details

ANALYZING THE IMPACT OF RESAMPLING METHOD FOR IMBALANCED DATA TEXT IN INDONESIAN SCIENTIFIC ARTICLES CATEGORIZATION

BACA: JURNAL DOKUMENTASI DAN INFORMASI

View Archive Info
 
 
Field Value
 
Title ANALYZING THE IMPACT OF RESAMPLING METHOD FOR IMBALANCED DATA TEXT IN INDONESIAN SCIENTIFIC ARTICLES CATEGORIZATION
 
Creator Indrawati, Ariani
Subagyo, Hendro
Sihombing, Andre
Wagiyah, Wagiyah
Afandi, Sjaeful
 
Subject Imbalanced data; Resampling techniques; Machine learning; Classification; Journal; ISJD
 
Description The extremely skewed data in artificial intelligence, machine learning, and data mining cases are often given misleading results. It is caused because machine learning algorithms are designated to work best with balanced data. However, we often meet with imbalanced data in the real situation. To handling imbalanced data issues, the most popular technique is resampling the dataset to modify the number of instances in the majority and minority classes into a standard balanced data. Many resampling techniques, oversampling, undersampling, or combined both of them, have been proposed and continue until now. Resampling techniques may increase or decrease the classifier performance. Comparative research on resampling methods in structured data has been widely carried out, but studies that compare resampling methods with unstructured data are very rarely conducted. That raises many questions, one of which is whether this method is applied to unstructured data such as text that has large dimensions and very diverse characters. To understand how different resampling techniques will affect the learning of classifiers for imbalanced data text, we perform an experimental analysis using various resampling methods with several classification algorithms to classify articles at the Indonesian Scientific Journal Database (ISJD). From this experiment, it is known resampling techniques on imbalanced data text generally to improve the classifier performance but they are doesn’t give significant result because data text has very diverse and large dimensions.
 
Publisher Pusat Data dan Dokumentasi Ilmiah – Lembaga Ilmu Pengetahuan Indonesia
 
Contributor
 
Date 2020-12-11
 
Type info:eu-repo/semantics/article
info:eu-repo/semantics/publishedVersion



 
Format application/pdf
 
Identifier http://jurnalbaca.pdii.lipi.go.id/baca/article/view/702
10.14203/j.baca.v41i2.702
 
Source BACA: Jurnal Dokumentasi dan Informasi; Vol 41, No 2 (2020): DESEMBER; 133-141
BACA: JURNAL DOKUMENTASI DAN INFORMASI; Vol 41, No 2 (2020): DESEMBER; 133-141
2301-8593
0125-9008
 
Language eng
 
Relation http://jurnalbaca.pdii.lipi.go.id/baca/article/view/702/pdf
http://jurnalbaca.pdii.lipi.go.id/baca/article/downloadSuppFile/702/254
http://jurnalbaca.pdii.lipi.go.id/baca/article/downloadSuppFile/702/533
http://jurnalbaca.pdii.lipi.go.id/baca/article/downloadSuppFile/702/534
 
Rights Copyright (c) 2020 BACA: JURNAL DOKUMENTASI DAN INFORMASI
http://creativecommons.org/licenses/by-nc-nd/4.0
 


www.freevisitorcounters.com