简体   繁体   English

正则表达式查找特定单词

[英]Regex to find specific word

I have a large file that contains multiple entries that look like these below:我有一个大文件,其中包含多个条目,如下所示:

{"author":["frack113"],"description":"Detects a Sysmon configuration change, which could be the result of a legitimate reconfiguration or someone trying manipulate the configuration","ruleId":"8ac03a65-6c84-4116-acad-dc1558ff7a77","falsePositives":["Legitimate administrative action"],"from":"now-360s","immutable":false,"outputIndex":".siem-signals-default","meta":{"from":"1m"},"maxSignals":100,"riskScore":35,"riskScoreMapping":[],"severity":"medium","severityMapping":[],"threat":[{"tactic":{"id":"TA0005","reference":"https://attack.mitre.org/tactics/TA0005","name":"Defense Evasion"},"framework":"MITRE ATT&CK®","technique":[]}],"to":"now","references":["https://docs.microsoft.com/en-us/sysinternals/downloads/sysmon"],"version":1,"exceptionsList":[],"index":["winlogbeat-*"],"query":"(winlog.channel:\"Microsoft\\-Windows\\-Sysmon\\/Operational\" AND winlog.event_id:\"16\")","language":"lucene","filters":[],"type":"query"},"schedule":{"interval":"5m"}}

And I am working on a python program to detect the string after the word "query", so for example in我正在开发一个 python 程序来检测单词“query”之后的字符串,例如

"query":"(winlog.channel:\"Microsoft\\-Windows\\-Sysmon\\/Operational\" AND winlog.event_id:\"16\")"

I am trying to detect (winlog.channel:\"Microsoft\\-Windows\\-Sysmon\\/Operational\" AND winlog.event_id:\"16\") and I have multiple of these to detect and then use it to compare against "query" in another file to find if there are any similarities.我正在尝试检测(winlog.channel:\"Microsoft\\-Windows\\-Sysmon\\/Operational\" AND winlog.event_id:\"16\")并且我有多个要检测然后使用它与另一个文件中的“查询”进行比较以查找是否有任何相似之处。

I tried using this regex, but is not able to detect "query" at all.我尝试使用这个正则表达式,但根本无法检测到“查询”。

(?<=^\"query\":\W)(\w.*)$ 

and

(?<='{\"query\"}':\s)'?([^'}},]+)

Would appreciate if anyone can give some pointers as I am stuck on this for hours!如果有人能提供一些指点,我将不胜感激,因为我被困了几个小时!

You have the python tag in your question as well - so I am assuming a solution involving python script should be fine.您的问题中也有 python 标签 - 所以我假设涉及 python 脚本的解决方案应该没问题。

Given that you have a file data.txt with entries as the given example:假设您有一个带有条目的文件 data.txt 作为给定示例:

{"author":["frack113"],"description":"Detects a Sysmon configuration change, which could be the result of a legitimate reconfiguration or someone trying manipulate the configuration","ruleId":"8ac03a65-6c84-4116-acad-dc1558ff7a77","falsePositives":["Legitimate administrative action"],"from":"now-360s","immutable":false,"outputIndex":".siem-signals-default","meta":{"from":"1m"},"maxSignals":100,"riskScore":35,"riskScoreMapping":[],"severity":"medium","severityMapping":[],"threat":[{"tactic":{"id":"TA0005","reference":"https://attack.mitre.org/tactics/TA0005","name":"Defense Evasion"},"framework":"MITRE ATT&CK®","technique":[]}],"to":"now","references":["https://docs.microsoft.com/en-us/sysinternals/downloads/sysmon"],"version":1,"exceptionsList":[],"index":["winlogbeat-*"],"query":"(winlog.channel:\"Microsoft\\-Windows\\-Sysmon\\/Operational\" AND winlog.event_id:\"16\")","language":"lucene","filters":[],"type":"query"},"schedule":{"interval":"5m"}}

Then, you can run the following script to print the required strings.然后,您可以运行以下脚本来打印所需的字符串。

def main():
    with open('data.txt') as f:
        for line in f:
            
            line = line.split("query")
            result = line[1]
            result = result.split(")")
            result = result[0][2:]

            print(result)

main()                      

For the example string you have provided, this script prints:对于您提供的示例字符串,此脚本将打印:

"(winlog.channel:\"Microsoft\\-Windows\\-Sysmon\\/Operational\" AND winlog.event_id:\"16\"

Hope it helps!希望能帮助到你!

2 ways to do it: 2种方法来做到这一点:

  1. Read in as json, then iterate through the dictionary.读入 json,然后遍历字典。 2) Read in as str and regex it. 2) 以 str 形式读入并对其进行正则表达式。

1. Read in as json: 1. 读入 json:

import json

file = 'exportedSignal.ndjson'
with open(file, 'r', encoding = 'cp850') as f:
    jsonData = json.load(f)

queries = []
hits = jsonData['hits']['hits']
for hit in hits:
    if 'query' in hit['_source']['alert']['params'].keys():
        query = hit['_source']['alert']['params']['query']
        queries.append(query)
print(queries)

2. Use Regex: 2.使用正则表达式:

import re

file = 'exportedSignal.ndjson'
with open(file, 'r', encoding = 'cp850') as f:
    data = f.read()

queries = re.findall('\"query\":\"(.*?)\"', data)
print(queries)

Output: Output:

Both produce a list of 2006 values from the "query" key.两者都从"query"键生成 2006 个值的列表。

在此处输入图像描述

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM