I am desperately trying to convert a nested JSON feature within a CSV into data frame rows. Could you help?
Sample CSV row
2021-09-26T08:25:43.021051958Z,"{""level"":""info"",""message"":""Success (Cached)"",""request"":""GET /api/v1/settingsid=3"",""httpCode"":200,""service"":""stats-vis-backend"",""timestamp"":""2021-09-26 08:25:43""}",ip-10-xxx-xxx-18.eu-central-1.compute.internal,podname-75ffdf6b-gns8v
Desired output (using JSON part only):
id | message | request | httpCode | service | timestamp |
---|---|---|---|---|---|
0 | Success (Cached) | GET /api/v1/settings?id=3 | 200 | stats-vis-backend | 2021-09-26 08:25:43 |
If this would be the data frame output structure, I would be more than happy. I tried JSON normalize etc., but I am far away from a solution.
Thanks so much!!
Best David
Full Code trial (based on SeaBean):
import csv
import ast
import pandas as pd
# read CSV
df = pd.read_csv('/Users/David/xaa.csv',sep=',', header=None)
print(df.head(1))
# convert string of JSON/dict to real JSON/dict
# the JSON/dict is at column `1` (second column from left)
df[1] = df[1].apply(ast.literal_eval)
# Create dataframe from the JSON part
df_json = pd.DataFrame(df[1].tolist())
print(df_json.head(1))
Full output dump
File "/Users/David/opt/anaconda3/lib/python3.8/site-packages/IPython/core/interactiveshell.py", line 3437, in run_code
exec(code_obj, self.user_global_ns, self.user_ns)
File "<ipython-input-7-86e494aa8f0c>", line 12, in <module>
df[1] = df[1].apply(ast.literal_eval)
File "/Users/David/opt/anaconda3/lib/python3.8/site-packages/pandas/core/series.py", line 4138, in apply
mapped = lib.map_infer(values, f, convert=convert_dtype)
File "pandas/_libs/lib.pyx", line 2467, in pandas._libs.lib.map_infer
File "/Users/David/opt/anaconda3/lib/python3.8/ast.py", line 59, in literal_eval
node_or_string = parse(node_or_string, mode='eval')
File "/Users/David/opt/anaconda3/lib/python3.8/ast.py", line 47, in parse
return compile(source, filename, mode, flags,
File "<unknown>", line 1
> next start
^
SyntaxError: invalid syntax
Sample Output df1
0 {"level":"info","message":"Success (Cached)","...
1 {"level":"info","message":"Success (Cached)","...
2 {"level":"info","message":"Success (Cached)","...
3 {"level":"info","message":"Success","request":...
4 {"level":"info","message":"Success (Cached)","...
...
249995 {"level":"info","message":"Success (Cached)","...
249996 {"level":"info","message":"Success (Cached)","...
249997 {"level":"info","message":"Success (Cached)","...
249998 {"level":"info","message":"Success","request":...
249999 {"level":"info","message":"Success (Cached)","...
Name: 1, Length: 250000, dtype: object
Sample toDict() Output of df1
{0: '{"level":"info","message":"Success (Cached)","request":"GET /api/v1/settings?id=3","httpCode":200,"service":"stats-vis-backend","timestamp":"2021-09-26 08:25:43"}',
1: '{"level":"info","message":"Success (Cached)","request":"GET /api/v1/settings?id=3","httpCode":200,"service":"stats-vis-backend","timestamp":"2021-09-26 08:26:17"}',
Output print(df.iloc[[4480]])
0 1 \
4480 2021-09-26T12:00:58.983344643Z > next start
2 \
4480 ip-10-xxx-xxxx-30.eu-central-1.compute.internal
3
4480 xxxx-converter-75ffxf6b-jq2w7
You can use pd.DataFrame
on the list of column values of the second column (with JSON) after converting the string of JSON to real JSON (not in string), as follows:
# read CSV
df = pd.read_csv(r'mycsv.csv', sep=',', header=None)
# convert string of JSON/dict to real JSON/dict
import ast
# the JSON/dict is at column `1` (second column from left)
df[1] = df[1].apply(ast.literal_eval)
# Create dataframe from the JSON part
df_json = pd.DataFrame(df[1].tolist())
If you have already read the CSV into dataframe with column header, you can also use the column label of the second column instead of 1
for the column label for second column in the codes above.
Result:
print(df_json)
level message request httpCode service timestamp
0 info Success (Cached) GET /api/v1/settingsid=3 200 stats-vis-backend 2021-09-26 08:25:43
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.