I'm looking for advice on a better (faster) way to approach this. My problem is that as you increase the length of the "hosts" list the program takes exponentially longer to complete, and if "hosts" is long enough it takes so long for the program to complete that it seems to just lock up.
My current approach is to use the regex patterns from the CSV file to search through every multi-line list item contained in the "hosts" i[7] element. There are 100's of possible matches, and I need to identify all matches associated with each IP address and assign the unique string from the CSV file to identify all pattern matches. Finally, I need to put that information into the "fullMatchList" to use later.
NOTE: Even though each list item in "searchPatterns" has up to 4 patterns, I only need it to identify the first pattern found and then it can move on to the next list item to continue finding matches for that IP.
for i in hosts:
if i[4] == "13579" or i[4] == "24680":
for j in searchPatterns:
for k in range(4):
if j[k] == "SKIP":
continue
else:
match = re.search(r'%s' % j[k], i[7], flags=re.DOTALL)
if match is not None:
if tempIP == "":
tempIP = i[0]
matchListPerIP.append(j[4])
elif tempIP == i[0]:
matchListPerIP.append(j[4])
elif tempIP != i[0]:
fullMatchList.append([tempIP, matchListPerIP])
tempIP = i[0]
matchListPerIP = []
matchListPerIP.append(j[4])
break
fullMatchList.append([tempIP, matchListPerIP])
Here's an example regex search pattern from the CSV file:
(?!(.*?)\\br2\\b)cpe:/o:microsoft:windows_server_2008:
That pattern is intended to identify Windows Server 2008, and includes a negative lookahead to avoid matching the R2 edition.
I'm new to Python so any advice is appreciated! Thank you!
The NIDS community has done a lot of work on testing the same string(s) (network packets) against a long list of regexes (firewall rules).
I haven't read the literature, but Coit et al.'s "Towards faster string matching for intrusion detection or exceeding the speed of Snort" appears to be a good starting point.
Quoting from the Introduction:
The basic string matching task that must be
performed by a NIDS is to match a number of patterns drawn from the NIDS rules to
each packet or reconstructed TCP stream that the NIDS is analyzing. In Snort, the
total number of rules available has become quite large, and continues to grow
rapidly. As of 10/10/2000 there were 854 rules included in the “10102kany.rules”
ruleset file [5]. 68 of these rules did not require content matching while 786
relied on content matching to identify harmful packets. Thus, even though not
every pattern string is applied to every stream, there are a large number of
patterns being applied to some streams. For example, in traffic inbound to a web
server, Snort v 1.6.3 with the snort.org ruleset, “10102kany.rules”, checks up to
3 15 pattern strings against each packet. At the moment, it checks each pattern in
turn using the Boyer-Moore algorithm. Since the patterns often have something in
common, it seemed likely that there is considerable scope for efficiency
improvements here, and so it has proved.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.