简体   繁体   English

从 Python 中的 JSON 日志文件中查找唯一 IP 地址列表

[英]Find a list of unique IP address from a JSON log file in Python

How to write a program in python to find a list of the unique IP addresses from a JSON file?如何在 python 中编写程序以从 JSON 文件中查找唯一 IP 地址列表?

I am only a newbie at Python and I have the following data in JSON format.我只是 Python 的新手,我有以下 JSON 格式的数据。 I want to find the unique values of "remoteIp" key.我想找到“remoteIp”键的唯一值。

{
    "jsonPayload": {
      "enforcedSecurityPolicy": {
        "configuredAction": "DENY",
        "preconfiguredExprIds": [
          "owasp-crs-v030001-id942220-sqli"
        ],
        "priority": 2000,
        "outcome": "DENY"
      }
    },
    "httpRequest": {
      "requestMethod": "POST",
      "requestUrl": "https://wwwwwww.google.com///n",
      "requestSize": "3004",
      "status": 403,
      "responseSize": "274",
      "userAgent": "okhttp/3.12.2",
      "remoteIp": "182.2.169.59",
      "serverIp": "10.114.44.4"
    }
}

The solution I have created till now is able to fetch all the remoteIp's but is not unique.到目前为止,我创建的解决方案能够获取所有 remoteIp,但不是唯一的。

import json
#unique_ip = {}
with open("automation.json") as file:
 data = json.load(file)
 for d1 in data:
  del d1['resource'], d1['timestamp'], d1['severity'], d1['logName'], d1['trace'], d1['spanId'], \
      d1['receiveTimestamp'], d1['jsonPayload']['statusDetails'], d1['jsonPayload']['@type'], d1['insertId'], \
      d1['jsonPayload']['enforcedSecurityPolicy']['name'], d1['httpRequest']['latency']

  # using d1['insertId'] above for uniquely identifying a record
  #print(d1['httpRequest']['remoteIp']) #d1['jsonPayload']['enforcedSecurityPolicy'])

with open('automation_new.json', 'w') as file:
   json.dump(data, file, indent=2)
for d2 in data:
    s1 = (d2['httpRequest']['requestUrl'])
    s2 = (d2['httpRequest']['requestMethod'])
    s3 = (d2['httpRequest']['remoteIp'])
    s4 = (str(d2['httpRequest']['status']))
    s5 = (d2['httpRequest']['userAgent'])
       #mylist = list((s1.split(), s2.split(), s3.split(), s5.split(), s4.split()))
    #mylist = list((s1, s2, s3, s4, s5))
    #def unique(s3):
        #x = np.array(s3)
        #print(np.unique(x))
    print(s3)
file.close()

use a set()使用 set()

a  = ['a','b','a']
b = set(a)
b
# print {'a', 'b'}

please print the type of s3请打印 s3 的类型

for d2 in data:
    #...
    s3 = (d2['httpRequest']['remoteIp'])
    #...
    print("length of unique ip set is " + str(len(unique_ip)))
    unique_ip.add(''.join(s3))

print(unique_ip)

You could use a set to maintain unique values:您可以使用set来维护唯一值:

unique_remote_ips = set()
for d2 in data:
    s1 = (d2['httpRequest']['requestUrl'])
    s2 = (d2['httpRequest']['requestMethod'])
    unique_remote_ips.add(d2['httpRequest']['remoteIp'])
    s4 = (str(d2['httpRequest']['status']))
    s5 = (d2['httpRequest']['userAgent'])

I didn't really understand what you want precisely to do, since remoteIp is a single field and not a list, so how can it have duplicate ip addresses?我真的不明白你想要做什么,因为remoteIp是一个字段而不是一个列表,所以它怎么会有重复的 IP 地址?

Anyway I would suggest you to store the info retrieved from the json file inside a defaultdict which will keep its keys unique without you having to worry about it.无论如何,我建议您将从 json 文件中检索到的信息存储在defaultdict中,这将保持其键的唯一性,而您不必担心它。

Once you're done you can iterate over your defaultdict and extract the unique keys you want :)完成后,您可以遍历您的defaultdict并提取您想要的唯一键:)

I encourage you to look for some defaultdict documentation and take a look at this question here .我鼓励您查找一些defaultdict文档并在此处查看此问题。

ps when you open a file inside the with block you don't have to close it afterwards, the file is automatically closed once you exit the with block ps 当您在with块内打开文件时,您不必在之后关闭它,一旦您退出with块,文件就会自动关闭

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM