This is my first post to stack overflow. So I'm a super noob. I'm working on a script that reads a file (emails related to maiware analysis), then uses regex to identify IP Addresses, MD5 hashes, and domain names.
Here's my script so far:
import re # import the regex library
fobj = open('email_with_IOCs.txt', 'r') # open the file to search for IOCs
text = fobj.read() # read the IOC file
ip_address = re.findall('\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}', text) # find all the IPs
md5hash = re.findall('[a-fA-F0-9]{32}', text) # find all the MD5
domain = re.findall('[a-zA-Z0-9-]{1,61}[a-zA-Z0-9]\.[a-zA-Z]{2,}', text) # find all the domains
iocs = open('iocs.txt', 'w') # open a file to write to
iocs.write(str(ip_address) + str(md5hash) + str(domain) + '\n' ) # write all the IOCs to a file
fobj.close() # close the input file
iocs.close() # close the output file
Here are the issues I'm trying to resolve:
I want the output to have one IP Address, MD5, or domain per line in the output file.
Some of the indicators of compromise are obfuscated for safety with brackets. Ex-1. [http:]//www.mcafeea[.]cf/tools.zip, Ex-2.118.99.37[.]190. I need to remove the brackets so I don't miss IPs.
My domain name regex is matching file names and domains. Ex-1. stuff.dll, Ex-2. setup.exe I'd like to read in all the TLDs (Top Level Domains) as a list and use the TLD list to separate domains from file names.
Questions 1 and 3: There is a python package to parse IOCs (Indicators of Compromise) from text here: https://github.com/fhightower/ioc-finder . The regex for hostnames in this package includes a list of valid TLDs.
Question 2: To remove the obfuscation on indicators of compromise (a process which is called "fanging" or "refanging"), there is a package to do this in a robust and systematic way: https://github.com/ioc-fang/ioc_fanger .
Full disclosure: I'm the one working on this package; feel free to raise an issue if you have any ideas.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.