繁体   English   中英

JSON 至 Pandas DF

[英]JSON to Pandas DF

我有一个来自 Azure 防火墙的数据集(防火墙日志),我将其存储在 JSON 的 Blob 存储中。 JSON 如下所示。

{ "category": "AzureFirewallNetworkRule", "time": "2021-01-31T00:00:00.1551130Z", "resourceId": "/SUBSCRIPTIONS/RESOURCEGROUPS/SEA-DEV/PROVIDERS/MICROSOFT.NETWORK/AZUREFIREWALLS/SEA-DEV", "operationName": "AzureFirewallNetworkRuleLog", "properties": {"msg":"TCP request from 172.16.1.218:54652 to 172.17.1.219:8080. Action: Allow"}}
{ "category": "AzureFirewallNetworkRule", "time": "2021-01-31T00:00:00.1268490Z", "resourceId": "/SUBSCRIPTIONS/RESOURCEGROUPS/SEA-DEV/PROVIDERS/MICROSOFT.NETWORK/AZUREFIREWALLS/SEA-DEV", "operationName": "AzureFirewallNetworkRuleLog", "properties": {"msg":"UDP request from 172.16.1.218:53067 to 8.8.8.8:53. Action: Allow"}}

每天有几百万行到 go 通过将源 IP 再次分组允许或拒绝的端口,所以我看到使用 JN 分析这些数据是可行的。

问题:

我尝试使用下面的代码,但在尝试展平我想要的“msg”的“属性”时遇到了问题。

import json
import pandas as pd

# load data using Python JSON module
with open('FWLog/FWLog2.json','r') as f:
    data = json.loads(f.read())
# Flatten data
df_nested_list = pd.json_normalize(data, record_path =['properties'])

错误:

---------------------------------------------------------------------------
JSONDecodeError                           Traceback (most recent call last)
<ipython-input-61-3500c0d62d55> in <module>
      7 # load data using Python JSON module
      8 with open('FWLog/FWLog2.json','r') as f:
----> 9     data = json.loads(f.read())
     10 # Flatten data
     11 df_nested_list = pd.json_normalize(data, record_path =['properties'])

~\anaconda3\lib\json\__init__.py in loads(s, cls, object_hook, parse_float, parse_int, parse_constant, object_pairs_hook, **kw)
    355             parse_int is None and parse_float is None and
    356             parse_constant is None and object_pairs_hook is None and not kw):
--> 357         return _default_decoder.decode(s)
    358     if cls is None:
    359         cls = JSONDecoder

~\anaconda3\lib\json\decoder.py in decode(self, s, _w)
    338         end = _w(s, end).end()
    339         if end != len(s):
--> 340             raise JSONDecodeError("Extra data", s, end)
    341         return obj
    342 

JSONDecodeError: Extra data: line 2 column 1 (char 386)

您可以在pd.read_json中使用lines=True

df = pd.read_json("your_file.txt", lines=True)
df_final = pd.concat([pd.DataFrame(df.pop("properties").to_list()), df], axis=1)
print(df_final)

印刷:

                                                 msg                  category                          time                                         resourceId                operationName
0  TCP request from 172.16.1.218:54652 to 172.17....  AzureFirewallNetworkRule  2021-01-31T00:00:00.1551130Z  /SUBSCRIPTIONS/RESOURCEGROUPS/SEA-DEV/PROVIDER...  AzureFirewallNetworkRuleLog
1  UDP request from 172.16.1.218:53067 to 8.8.8.8...  AzureFirewallNetworkRule  2021-01-31T00:00:00.1268490Z  /SUBSCRIPTIONS/RESOURCEGROUPS/SEA-DEV/PROVIDER...  AzureFirewallNetworkRuleLog

您的文件中有多个 json。 该错误发生在 json 负载中。

import json
import pandas as pd

# load data using Python JSON module
with open('test_json.json') as f:
    data = [json.loads(line) for line in f]
# Flatten data
pd.DataFrame([j['properties'] for j in data])
msg
0   TCP request from 172.16.1.218:54652 to 172.17....
1   UDP request from 172.16.1.218:53067 to 8.8.8.8...

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM