Hoax Identification of Indonesian Tweeters using Ensemble Classifier
Subject Areas : Machine learning
Gus Nanang Syaifuddiin
1
,
Rizal Arifin
2
,
Desriyanti Desriyanti
3
,
Ghulam Asrofi Buntoro
4
,
Zulkham Umar Rosyidin
5
,
Ridwan Yudha Pratama
6
,
Ali Selamat
7
1 - Department of Information Technology, Politeknik Negeri Madiun, Jl. Serayu No. 84 Madiun 63133, Indonesia
2 - Faculty of Engineering, Universitas Muhammadiyah Ponorogo, Jl. Budi Utomo No. 10 Ponorogo 63471, Indonesia
3 - Faculty of Engineering, Universitas Muhammadiyah Ponorogo, Jl. Budi Utomo No. 10 Ponorogo 63471, Indonesia
4 - Faculty of Engineering, Universitas Muhammadiyah Ponorogo, Jl. Budi Utomo No. 10 Ponorogo 63471, Indonesia
5 - Faculty of Engineering, Universitas Muhammadiyah Ponorogo, Jl. Budi Utomo No. 10 Ponorogo 63471, Indonesia
6 - Faculty of Engineering, Universitas Muhammadiyah Ponorogo, Jl. Budi Utomo No. 10 Ponorogo 63471, Indonesia
7 - Malaysia-Japan International Institute of Technology, Universiti Teknologi Malaysia, Jalan Sultan Yahya Petra, Kuala Lumpur 54100, Malaysia
Keywords: Hoax, Identification, Bahasa Indonesia, N-Gram, TF-IDF, Passive Aggressive Classifier,
Abstract :
Fake information, better known as hoaxes, is often found on social media. Currently, social media is not only used to make friends or socialize with friends online, but some use it to spread hate speech and false information. Hoaxes are very dangerous in social life, especially in countries with large populations and ethnically diverse cultures, such as Indonesia. Although there have been many studies on detecting false information, the accuracy and efficiency still need to be improved. To help prevent the spread of these hoaxes, we built a model to identify false information in Indonesian using an ensemble classifier that combines the n-gram method, term frequency-inverse document frequency, and passive-aggressive classifier method. The evaluation process was carried out using 5000 samples from Twitter social media accounts in this study. The testing process is carried out using four schemes by dividing the dataset into training and test data based on the ratios of 90:10, 80:20, 70:30, and 60:40. The inspection results show that our software can accurately detect hoaxes at 91.8%. We also found an increase in the accuracy and precision of hoax detection testing using the proposed method compared to several previous studies. The results show that our proposed method can be developed and used in detecting hoaxes in Indonesian on various social media platforms.