Boosting the detection of malicious documents using designated active learning methods

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Most organizations usually create, send and receive huge amounts of documents daily, Attackers increasingly take advantage of innocent users who tend to casually open email massages assumed to be benign, carrying malicious documents. Recent targeted attacks aimed at organizations, utilize the new Microsoft Word documents (∗.docx). Anti-virus software fails to detect new unknown malicious files, including malicious docx files. In this study, we present SFEM feature extraction methodology and designated Active Learning (AL) methods, aimed at accurate detection of new unknown malicious docx files that also efficiently enhances the detection's model capabilities over time. Our AL methods identify and acquire only small set of new docx files that are most likely malicious, as well as informative benign files, these files are used for enhancing the knowledge stores of both the detection model and the anti-virus software. Results show that our active learning methods used only 14% of the labeled docx files within organization which led to a reduction of 95.5% in labeling efforts compared to passive learning and SVM-Margin (existing active learning method). Our AL methods also showed a significant improvement of 91% in unknown docx malware acquisition compared to passive learning and SVM-Margin, thus providing an improved updating solution for detection model, as well as the anti-virus software widely used within organizations.

Original languageAmerican English
Title of host publicationProceedings - 2015 IEEE 14th International Conference on Machine Learning and Applications, ICMLA 2015
Pages760-765
Number of pages6
ISBN (Electronic)9781509002870
DOIs
StatePublished - 2 Mar 2016
EventIEEE 14th International Conference on Machine Learning and Applications, ICMLA 2015 - Miami, United States
Duration: 9 Dec 201511 Dec 2015

Conference

ConferenceIEEE 14th International Conference on Machine Learning and Applications, ICMLA 2015
Country/TerritoryUnited States
CityMiami
Period9/12/1511/12/15

Keywords

  • Active learning
  • Documents
  • Docx
  • Machine learning
  • Malicious
  • Malware
  • Microsoft office files
  • Structural

All Science Journal Classification (ASJC) codes

  • Artificial Intelligence
  • Computer Science Applications

Fingerprint

Dive into the research topics of 'Boosting the detection of malicious documents using designated active learning methods'. Together they form a unique fingerprint.

Cite this