![](/img/trans.png)
[英]Convert Pandas Dataframe or csv file to Custom Nested JSON
[英]Convert nested JSON from a CSV into a Pandas dataframe
我拼命地尝试将 CSV 中的嵌套 JSON 功能转换为数据框行。 你能帮忙吗?
示例 CSV 行
2021-09-26T08:25:43.021051958Z,"{""level"":""info"",""message"":""成功(缓存)"",""请求"":""GET / api/v1/settingsid=3"",""httpCode"":200,""service"":""stats-vis-backend"",""timestamp"":""2021-09-26 08:25 :43""}",ip-10-xxx-xxx-18.eu-central-1.compute.internal,podname-75ffdf6b-gns8v
所需的输出(仅使用 JSON 部分):
ID | 信息 | 要求 | http代码 | 服务 | 时间戳 |
---|---|---|---|---|---|
0 | 成功(缓存) | 获取/api/v1/settings?id=3 | 200 | 后台统计 | 2021-09-26 08:25:43 |
如果这是数据帧输出结构,我会非常高兴。 我尝试了 JSON normalize 等,但我离解决方案还很远。
非常感谢!!
最佳大卫
完整代码试用(基于SeaBean):
import csv
import ast
import pandas as pd
# read CSV
df = pd.read_csv('/Users/David/xaa.csv',sep=',', header=None)
print(df.head(1))
# convert string of JSON/dict to real JSON/dict
# the JSON/dict is at column `1` (second column from left)
df[1] = df[1].apply(ast.literal_eval)
# Create dataframe from the JSON part
df_json = pd.DataFrame(df[1].tolist())
print(df_json.head(1))
完整输出转储
File "/Users/David/opt/anaconda3/lib/python3.8/site-packages/IPython/core/interactiveshell.py", line 3437, in run_code
exec(code_obj, self.user_global_ns, self.user_ns)
File "<ipython-input-7-86e494aa8f0c>", line 12, in <module>
df[1] = df[1].apply(ast.literal_eval)
File "/Users/David/opt/anaconda3/lib/python3.8/site-packages/pandas/core/series.py", line 4138, in apply
mapped = lib.map_infer(values, f, convert=convert_dtype)
File "pandas/_libs/lib.pyx", line 2467, in pandas._libs.lib.map_infer
File "/Users/David/opt/anaconda3/lib/python3.8/ast.py", line 59, in literal_eval
node_or_string = parse(node_or_string, mode='eval')
File "/Users/David/opt/anaconda3/lib/python3.8/ast.py", line 47, in parse
return compile(source, filename, mode, flags,
File "<unknown>", line 1
> next start
^
SyntaxError: invalid syntax
示例输出 df1
0 {"level":"info","message":"Success (Cached)","...
1 {"level":"info","message":"Success (Cached)","...
2 {"level":"info","message":"Success (Cached)","...
3 {"level":"info","message":"Success","request":...
4 {"level":"info","message":"Success (Cached)","...
...
249995 {"level":"info","message":"Success (Cached)","...
249996 {"level":"info","message":"Success (Cached)","...
249997 {"level":"info","message":"Success (Cached)","...
249998 {"level":"info","message":"Success","request":...
249999 {"level":"info","message":"Success (Cached)","...
Name: 1, Length: 250000, dtype: object
df1 的 toDict() 输出示例
{0: '{"level":"info","message":"Success (Cached)","request":"GET /api/v1/settings?id=3","httpCode":200,"service":"stats-vis-backend","timestamp":"2021-09-26 08:25:43"}',
1: '{"level":"info","message":"Success (Cached)","request":"GET /api/v1/settings?id=3","httpCode":200,"service":"stats-vis-backend","timestamp":"2021-09-26 08:26:17"}',
输出打印(df.iloc[[4480]])
0 1 \
4480 2021-09-26T12:00:58.983344643Z > next start
2 \
4480 ip-10-xxx-xxxx-30.eu-central-1.compute.internal
3
4480 xxxx-converter-75ffxf6b-jq2w7
将JSON的字符串转换为真正的JSON(非字符串)后,您可以在第二列(带JSON)的列值列表上使用pd.DataFrame
,如下所示:
# read CSV
df = pd.read_csv(r'mycsv.csv', sep=',', header=None)
# convert string of JSON/dict to real JSON/dict
import ast
# the JSON/dict is at column `1` (second column from left)
df[1] = df[1].apply(ast.literal_eval)
# Create dataframe from the JSON part
df_json = pd.DataFrame(df[1].tolist())
如果您已经将 CSV 读入带有列标题的数据框,您还可以使用第二列的列标签而不是1
作为上述代码中第二列的列标签。
结果:
print(df_json)
level message request httpCode service timestamp
0 info Success (Cached) GET /api/v1/settingsid=3 200 stats-vis-backend 2021-09-26 08:25:43
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.