COMPARISON OF MACHINE LEARNING TECHNIQUES IN SPAM E-MAIL CLASSIFICATION

Samed Jukic, Jasmin Azemovic, Dino Keco, Jasmin Kevric

Abstract


E-mail still proves to be very popular and an efficient communication tool. Due to its misuse, however, managing e-mails is an important problem for organizations and individuals. Spam, known as unwanted message, is an example of misuse. Specifically, spam is defined as the arrival of unwelcomed bulk email not being requested for by recipients. This paper compares different Machine Learning Techniques in classification of spam e-mails. Random Forest (RF), C4.5 decision tree and Artificial Neural Network (ANN) were tested to determine which method provides the best results in spam e-mail classification. Our results show that RF is the best technique applied on dataset from HP Labs, indicating that ensemble methods may have an edge in spam detection

Keywords


Random Forest (RF), C4.5 Decision Tree, Artificial Neural Network (ANN), Spam Detection, Ensemble methods

Full Text:

PDF


DOI: http://dx.doi.org/10.21533/scjournal.v4i1.88

Refbacks

  • There are currently no refbacks.


Copyright (c) 2015 Samed Jukic, Jasmin Azemovic, Dino Keco, Jasmin Kevric

ISSN 2233 -1859

Digital Object Identifier DOI: 10.21533/scjournal

Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 International License