简体   繁体   English

将嵌套的 JSON 从 CSV 转换为 Pandas 数据框

[英]Convert nested JSON from a CSV into a Pandas dataframe

I am desperately trying to convert a nested JSON feature within a CSV into data frame rows.我拼命地尝试将 CSV 中的嵌套 JSON 功能转换为数据框行。 Could you help?你能帮忙吗?

Sample CSV row示例 CSV 行

2021-09-26T08:25:43.021051958Z,"{""level"":""info"",""message"":""Success (Cached)"",""request"":""GET /api/v1/settingsid=3"",""httpCode"":200,""service"":""stats-vis-backend"",""timestamp"":""2021-09-26 08:25:43""}",ip-10-xxx-xxx-18.eu-central-1.compute.internal,podname-75ffdf6b-gns8v 2021-09-26T08:25:43.021051958Z,"{""level"":""info"",""message"":""成功(缓存)"",""请求"":""GET / api/v1/settingsid=3"",""httpCode"":200,""service"":""stats-vis-backend"",""timestamp"":""2021-09-26 08:25 :43""}",ip-10-xxx-xxx-18.eu-central-1.compute.internal,podname-75ffdf6b-gns8v

Desired output (using JSON part only):所需的输出(仅使用 JSON 部分):

id ID message信息 request要求 httpCode http代码 service服务 timestamp时间戳
0 0 Success (Cached)成功(缓存) GET /api/v1/settings?id=3获取/api/v1/settings?id=3 200 200 stats-vis-backend后台统计 2021-09-26 08:25:43 2021-09-26 08:25:43

If this would be the data frame output structure, I would be more than happy.如果这是数据帧输出结构,我会非常高兴。 I tried JSON normalize etc., but I am far away from a solution.我尝试了 JSON normalize 等,但我离解决方案还很远。

Thanks so much!!非常感谢!!

Best David最佳大卫

Full Code trial (based on SeaBean):完整代码试用(基于SeaBean):


import csv
import ast
import pandas as pd

# read CSV 
df = pd.read_csv('/Users/David/xaa.csv',sep=',', header=None)

print(df.head(1))

# convert string of JSON/dict to real JSON/dict 
# the JSON/dict is at column `1` (second column from left)
df[1] = df[1].apply(ast.literal_eval)

# Create dataframe from the JSON part
df_json = pd.DataFrame(df[1].tolist())

print(df_json.head(1))

Full output dump完整输出转储

File "/Users/David/opt/anaconda3/lib/python3.8/site-packages/IPython/core/interactiveshell.py", line 3437, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)

  File "<ipython-input-7-86e494aa8f0c>", line 12, in <module>
    df[1] = df[1].apply(ast.literal_eval)

  File "/Users/David/opt/anaconda3/lib/python3.8/site-packages/pandas/core/series.py", line 4138, in apply
    mapped = lib.map_infer(values, f, convert=convert_dtype)

  File "pandas/_libs/lib.pyx", line 2467, in pandas._libs.lib.map_infer

  File "/Users/David/opt/anaconda3/lib/python3.8/ast.py", line 59, in literal_eval
    node_or_string = parse(node_or_string, mode='eval')

  File "/Users/David/opt/anaconda3/lib/python3.8/ast.py", line 47, in parse
    return compile(source, filename, mode, flags,

  File "<unknown>", line 1
    > next start
    ^
SyntaxError: invalid syntax

Sample Output df1示例输出 df1

0         {"level":"info","message":"Success (Cached)","...
1         {"level":"info","message":"Success (Cached)","...
2         {"level":"info","message":"Success (Cached)","...
3         {"level":"info","message":"Success","request":...
4         {"level":"info","message":"Success (Cached)","...
                                ...                        
249995    {"level":"info","message":"Success (Cached)","...
249996    {"level":"info","message":"Success (Cached)","...
249997    {"level":"info","message":"Success (Cached)","...
249998    {"level":"info","message":"Success","request":...
249999    {"level":"info","message":"Success (Cached)","...
Name: 1, Length: 250000, dtype: object

Sample toDict() Output of df1 df1 的 toDict() 输出示例

{0: '{"level":"info","message":"Success (Cached)","request":"GET /api/v1/settings?id=3","httpCode":200,"service":"stats-vis-backend","timestamp":"2021-09-26 08:25:43"}',
 1: '{"level":"info","message":"Success (Cached)","request":"GET /api/v1/settings?id=3","httpCode":200,"service":"stats-vis-backend","timestamp":"2021-09-26 08:26:17"}',

Output print(df.iloc[[4480]])输出打印(df.iloc[[4480]])

                               0             1  \
4480  2021-09-26T12:00:58.983344643Z  > next start   

                                                   2  \
4480  ip-10-xxx-xxxx-30.eu-central-1.compute.internal   

                                  3  
4480  xxxx-converter-75ffxf6b-jq2w7  

You can use pd.DataFrame on the list of column values of the second column (with JSON) after converting the string of JSON to real JSON (not in string), as follows:将JSON的字符串转换为真正的JSON(非字符串)后,您可以在第二列(带JSON)的列值列表上使用pd.DataFrame ,如下所示:

# read CSV 
df = pd.read_csv(r'mycsv.csv', sep=',', header=None)

# convert string of JSON/dict to real JSON/dict 
import ast
# the JSON/dict is at column `1` (second column from left)
df[1] = df[1].apply(ast.literal_eval)

# Create dataframe from the JSON part
df_json = pd.DataFrame(df[1].tolist())

If you have already read the CSV into dataframe with column header, you can also use the column label of the second column instead of 1 for the column label for second column in the codes above.如果您已经将 CSV 读入带有列标题的数据框,您还可以使用第二列的列标签而不是1作为上述代码中第二列的列标签。

Result:结果:

print(df_json)


  level           message                   request  httpCode            service            timestamp
0  info  Success (Cached)  GET /api/v1/settingsid=3       200  stats-vis-backend  2021-09-26 08:25:43

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM