簡體   English   中英

使用Python過濾子字符串后提取數據

[英]Extracting data after filtering substring using Python

我有這種格式的DNS流量的JSON文件

{
    "index": {
        "_type": "answer_query", 
        "_id": 0, 
        "_index": "index_name"
    }
}

{
    "answer_section": " ", 
    "query_type": "A", 
    "authority_section": "com. 172 IN SOA a.xxxx-xxxx.net. nstld.xxxx-xxxxcom. 1526440480 1800 900 604800 86400", 
    "record_code": "NXDOMAIN", 
    "ip_src": "xx.xx.xx.xx", 
    "response_ip": "xx.xx.xx.xx", 
    "date_time": "2018-05-16T00:57:20Z", 
    "checksum": "CORRECT", 
    "query_name": "xx.xxxx.com.", 
    "port_src": 50223, 
    "question_section": "xx.xxxx.com. IN A", 
    "answer_count_section": 0
}

我需要提取authority_section中小於300的空格后的數字(在示例中為172),並忽略不滿足要求的數據,然后將輸出寫入另一個JSON文件。

我該如何實現? 謝謝

假設stack1.txt是您發布的文件。 這將編寫一個新文件stack2.txt,如果“空格后的值”> = 300,則該行將省略“ authority_section”行。此解決方案不需要解析json,但是它非常依賴於所要存儲的數據格式一致的。

import os
with open('stack2.txt','w') as new_file:
    old_file = open('stack1.txt').readlines()
    delete_file = False
    for line in old_file:
        if not (line.strip().startswith('"authority_section"') and int(line.split(':')[1].split()[1]) >= 300):
            new_file.write(line)
        else:
            delete_file = True
if delete_file:
    os.remove('stack2.txt')

您可以嘗試如下操作:

#!/usr/bin/python3
import json
import re

data = (
    """
    {
         "answer_section": " ",
         "query_type": "A",
         "authority_section": "com. 172 IN SOA a.xxxx-xxxx.net. nstld.xxxx-xxxxcom. 1526440480 1800 900 604800 86400",
         "record_code": "NXDOMAIN",
         "ip_src": "xx.xx.xx.xx",
         "response_ip": "xx.xx.xx.xx",
         "date_time": "2018-05-16T00:57:20Z",
         "checksum": "CORRECT",
         "query_name": "xx.xxxx.com.",
         "port_src": 50223,
         "question_section": "xx.xxxx.com. IN A",
         "answer_count_section": 0
    }
    """
)


json_data = json.loads(data)
print('BEFORE: ', json_data)

r = re.compile('^\s([1-2]\d\d|[1-9]\d|[1-9])\s$')


found = False
key_to_delete = None

for key, value in json_data.items():
    if value == 0:
        pass
    else:
        tmp = str(value)
        for i in range(0, len(tmp)):
            if r.match(tmp[i:i+3]):
                found = True
                key_to_delete = key
                print('FOUND 1: ', value)
            elif r.match(tmp[i:i+4]):
                found = True
                key_to_delete = key
                print('FOUND 2: ', value)
            elif r.match(tmp[i:i+5]):
                found = True
                key_to_delete = key
                print('FOUND 3: ', value)

if found:
    json_data.pop(key_to_delete)

print('RESULT: ', json_data)

我在回答中使用了正則表達式。 閱讀有關正則表達式的更多信息。

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM