Abstract—Being one of the most powerful and fastest way of
communication, the popularity of email has led to untoward rise
of email spam. Spam are unwanted and unsolicited messages
and the subsequent rise of spam received by email users has
become a serious security threat. Automatic filtering of spam
emails, hence, is a promising and research worthy area
whereupon extensive work has been reported about attempts to
design machine learning based classifiers. Herein feature
selection technique can be conveniently applied for developing
efficient machine learning based classifiers. However, feature
selection techniques provide a mechanism to identify suitable
and relevant features (attributes) for any knowledge discovery
task. The choice of selecting a suitable feature selection
technique is always a key question of research. The present
paper compares and discusses the effectiveness of two feature
selection methods i.e. Chi-square and Info-gain on machine
learning techniques namely Bayes algorithm, tree-based
algorithm and support vector machine with a purpose to design
a classifier for spam email filtering. The experiment is
performed using 10-fold cross-validation and performance
measures such as accuracy, precision, recall are used to compare
the results.
Index Terms—Classification algorithms, email spam Filtering,
feature selection.
Aakanksha Sharaff and Naresh Kumar Nagwani are with the Department
of Computer Science & Engg., National Institute of Technology, Raipur,
492010, India (e-mail: asharaff.cs@nitrr.ac.in, nknagwani.cs@nitrr.ac.in).
Kunal Swami is with Samsung Research India, India.
[PDF]
Cite: Aakanksha Sharaff, Naresh Kumar Nagwani, and Kunal Swami, "Impact of Feature Selection Technique on Email Classification," International Journal of Knowledge Engineering vol. 1, no. 1, pp. 59-63, 2015.