Spam Mail Filtering

Electronic Mail is used daily by millions of people to communicate around the globe and is mission-critical application for many businesses. Mailing systems have suffered from degraded quality of service due to rampant spam, phishing and fraudulent emails. This is partly because the classification speed of email filtering systems falls far behind the requirements of email service providers. In this report, we would discuss various Algorithms implemented to Classify Spam Emails. We investigate thoroughly the performance of these filters on a publicly available corpus, contributing towards standard benchmarks. At the same time, we compare the performance of these filters with each other, after introducing suitable cost-sensitive evaluation measures. All methods achieve very accurate spam filtering, outperforming clearly the keyword-based filter of a widely used e-mail reader.

  • Feature Extraction from emails, generating vocabulary through lemmatization and stemming and converting them to feature vectors and classifying them as spam or non-spam. Analyzing and comparing the performance of various classification algorithms.
  • Github Source Code Link.