TY - GEN
T1 - Accelerating regular expression matching over compressed HTTP
AU - Becchi, Michela
AU - Bremler-Barr, Anat
AU - Hay, David
AU - Kochba, Omer
AU - Koral, Yaron
N1 - Publisher Copyright: © 2015 IEEE.
PY - 2015/8/21
Y1 - 2015/8/21
N2 - This paper focuses on regular expression matching over compressed traffic. The need for such matching arises from two independent trends. First, the volume and share of compressed HTTP traffic is constantly increasing. Second, due to their superior expressibility, current Deep Packet Inspection engines use regular expressions more and more frequently. We present an algorithmic framework to accelerate such matching, taking advantage of information gathered when the traffic was initially compressed. HTTP compression is typically performed through the GZIP protocol, which uses back-references to repeated strings. Our algorithm is based on calculating (for every byte) the minimum number of (previous) bytes that can be part of a future regular expression matching. When inspecting a back-reference, only these bytes should be taken into account, thus enabling one to skip repeated strings almost entirely without missing a match. We show that our generic framework works with either NFA-based or DFA-based implementations and gains performance boosts of more than 70%. Moreover, it can be readily adapted to most existing regular expression matching algorithms, which usually are based either on NFA, DFA or combinations of the two. Finally, we discuss other applications in which calculating the number of relevant bytes becomes handy, even when the traffic is not compressed.
AB - This paper focuses on regular expression matching over compressed traffic. The need for such matching arises from two independent trends. First, the volume and share of compressed HTTP traffic is constantly increasing. Second, due to their superior expressibility, current Deep Packet Inspection engines use regular expressions more and more frequently. We present an algorithmic framework to accelerate such matching, taking advantage of information gathered when the traffic was initially compressed. HTTP compression is typically performed through the GZIP protocol, which uses back-references to repeated strings. Our algorithm is based on calculating (for every byte) the minimum number of (previous) bytes that can be part of a future regular expression matching. When inspecting a back-reference, only these bytes should be taken into account, thus enabling one to skip repeated strings almost entirely without missing a match. We show that our generic framework works with either NFA-based or DFA-based implementations and gains performance boosts of more than 70%. Moreover, it can be readily adapted to most existing regular expression matching algorithms, which usually are based either on NFA, DFA or combinations of the two. Finally, we discuss other applications in which calculating the number of relevant bytes becomes handy, even when the traffic is not compressed.
UR - http://www.scopus.com/inward/record.url?scp=84954231035&partnerID=8YFLogxK
U2 - https://doi.org/10.1109/INFOCOM.2015.7218421
DO - https://doi.org/10.1109/INFOCOM.2015.7218421
M3 - منشور من مؤتمر
T3 - Proceedings - IEEE INFOCOM
SP - 540
EP - 548
BT - 2015 IEEE Conference on Computer Communications, IEEE INFOCOM 2015
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 34th IEEE Annual Conference on Computer Communications and Networks, IEEE INFOCOM 2015
Y2 - 26 April 2015 through 1 May 2015
ER -