[英]Convert nested JSON from a CSV into a Pandas dataframe
I am desperately trying to convert a nested JSON feature within a CSV into data frame rows.我拼命地尝试将 CSV 中的嵌套 JSON 功能转换为数据框行。 Could you help?你能帮忙吗?
Sample CSV row示例 CSV 行
2021-09-26T08:25:43.021051958Z,"{""level"":""info"",""message"":""Success (Cached)"",""request"":""GET /api/v1/settingsid=3"",""httpCode"":200,""service"":""stats-vis-backend"",""timestamp"":""2021-09-26 08:25:43""}",ip-10-xxx-xxx-18.eu-central-1.compute.internal,podname-75ffdf6b-gns8v 2021-09-26T08:25:43.021051958Z,"{""level"":""info"",""message"":""成功(缓存)"",""请求"":""GET / api/v1/settingsid=3"",""httpCode"":200,""service"":""stats-vis-backend"",""timestamp"":""2021-09-26 08:25 :43""}",ip-10-xxx-xxx-18.eu-central-1.compute.internal,podname-75ffdf6b-gns8v
Desired output (using JSON part only):所需的输出(仅使用 JSON 部分):
id ID | message信息 | request要求 | httpCode http代码 | service服务 | timestamp时间戳 |
---|---|---|---|---|---|
0 0 | Success (Cached)成功(缓存) | GET /api/v1/settings?id=3获取/api/v1/settings?id=3 | 200 200 | stats-vis-backend后台统计 | 2021-09-26 08:25:43 2021-09-26 08:25:43 |
If this would be the data frame output structure, I would be more than happy.如果这是数据帧输出结构,我会非常高兴。 I tried JSON normalize etc., but I am far away from a solution.我尝试了 JSON normalize 等,但我离解决方案还很远。
Thanks so much!!非常感谢!!
Best David最佳大卫
Full Code trial (based on SeaBean):完整代码试用(基于SeaBean):
import csv
import ast
import pandas as pd
# read CSV
df = pd.read_csv('/Users/David/xaa.csv',sep=',', header=None)
print(df.head(1))
# convert string of JSON/dict to real JSON/dict
# the JSON/dict is at column `1` (second column from left)
df[1] = df[1].apply(ast.literal_eval)
# Create dataframe from the JSON part
df_json = pd.DataFrame(df[1].tolist())
print(df_json.head(1))
Full output dump完整输出转储
File "/Users/David/opt/anaconda3/lib/python3.8/site-packages/IPython/core/interactiveshell.py", line 3437, in run_code
exec(code_obj, self.user_global_ns, self.user_ns)
File "<ipython-input-7-86e494aa8f0c>", line 12, in <module>
df[1] = df[1].apply(ast.literal_eval)
File "/Users/David/opt/anaconda3/lib/python3.8/site-packages/pandas/core/series.py", line 4138, in apply
mapped = lib.map_infer(values, f, convert=convert_dtype)
File "pandas/_libs/lib.pyx", line 2467, in pandas._libs.lib.map_infer
File "/Users/David/opt/anaconda3/lib/python3.8/ast.py", line 59, in literal_eval
node_or_string = parse(node_or_string, mode='eval')
File "/Users/David/opt/anaconda3/lib/python3.8/ast.py", line 47, in parse
return compile(source, filename, mode, flags,
File "<unknown>", line 1
> next start
^
SyntaxError: invalid syntax
Sample Output df1示例输出 df1
0 {"level":"info","message":"Success (Cached)","...
1 {"level":"info","message":"Success (Cached)","...
2 {"level":"info","message":"Success (Cached)","...
3 {"level":"info","message":"Success","request":...
4 {"level":"info","message":"Success (Cached)","...
...
249995 {"level":"info","message":"Success (Cached)","...
249996 {"level":"info","message":"Success (Cached)","...
249997 {"level":"info","message":"Success (Cached)","...
249998 {"level":"info","message":"Success","request":...
249999 {"level":"info","message":"Success (Cached)","...
Name: 1, Length: 250000, dtype: object
Sample toDict() Output of df1 df1 的 toDict() 输出示例
{0: '{"level":"info","message":"Success (Cached)","request":"GET /api/v1/settings?id=3","httpCode":200,"service":"stats-vis-backend","timestamp":"2021-09-26 08:25:43"}',
1: '{"level":"info","message":"Success (Cached)","request":"GET /api/v1/settings?id=3","httpCode":200,"service":"stats-vis-backend","timestamp":"2021-09-26 08:26:17"}',
Output print(df.iloc[[4480]])输出打印(df.iloc[[4480]])
0 1 \
4480 2021-09-26T12:00:58.983344643Z > next start
2 \
4480 ip-10-xxx-xxxx-30.eu-central-1.compute.internal
3
4480 xxxx-converter-75ffxf6b-jq2w7
You can use pd.DataFrame
on the list of column values of the second column (with JSON) after converting the string of JSON to real JSON (not in string), as follows:将JSON的字符串转换为真正的JSON(非字符串)后,您可以在第二列(带JSON)的列值列表上使用pd.DataFrame
,如下所示:
# read CSV
df = pd.read_csv(r'mycsv.csv', sep=',', header=None)
# convert string of JSON/dict to real JSON/dict
import ast
# the JSON/dict is at column `1` (second column from left)
df[1] = df[1].apply(ast.literal_eval)
# Create dataframe from the JSON part
df_json = pd.DataFrame(df[1].tolist())
If you have already read the CSV into dataframe with column header, you can also use the column label of the second column instead of 1
for the column label for second column in the codes above.如果您已经将 CSV 读入带有列标题的数据框,您还可以使用第二列的列标签而不是1
作为上述代码中第二列的列标签。
Result:结果:
print(df_json)
level message request httpCode service timestamp
0 info Success (Cached) GET /api/v1/settingsid=3 200 stats-vis-backend 2021-09-26 08:25:43
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.