[英]Convert Nested Json to CSV Python
我正在嘗試將復雜的 json (嵌套格式)轉換為csv
。
{
"caudal": [
{"ts": 1612746051248, "value": "0.0"},
{"ts": 1612745450856, "value": "0.0"},
{"ts": 1612744250898, "value": "0.0"},
{"ts": 1612743650861, "value": "0.0"},
{"ts": 1612743050821, "value": "0.0"}
],
"FreeHeap": [
{"ts": 1612746051248, "value": "247564"},
{"ts": 1612745450856, "value": "247564"},
{"ts": 1612744250898, "value": "247564"},
{"ts": 1612743650861, "value": "247564"},
{"ts": 1612743050821, "value": "247564"}
],
"MinimoFreeHeap": [
{"ts": 1612746051248, "value": "237440"},
{"ts": 1612745450856, "value": "237440"},
{"ts": 1612744250898, "value": "237440"},
{"ts": 1612743650861, "value": "237440"},
{"ts": 1612743050821, "value": "237440"}
]
}
我的程序必須處理的 jsons 包含更多記錄,但我將其縮小以簡化分析。我嘗試使用 pandas 庫,如下所示:
import pandas as pd
with open('read.json') as f_input:
df = pd.read_json(f_input)
df.to_csv('out.csv', encoding='utf-8', index=False)
我得到以下結果:
caudal,FreeHeap,MinimoFreeHeap
"{'ts': 1612746051248, 'value': '0.0'}","{'ts': 1612746051248, 'value': '247564'}","{'ts': 1612746051248, 'value': '237440'}"
"{'ts': 1612745450856, 'value': '0.0'}","{'ts': 1612745450856, 'value': '247564'}","{'ts': 1612745450856, 'value': '237440'}"
"{'ts': 1612744250898, 'value': '0.0'}","{'ts': 1612744250898, 'value': '247564'}","{'ts': 1612744250898, 'value': '237440'}"
"{'ts': 1612743650861, 'value': '0.0'}","{'ts': 1612743650861, 'value': '247564'}","{'ts': 1612743650861, 'value': '237440'}"
"{'ts': 1612743050821, 'value': '0.0'}","{'ts': 1612743050821, 'value': '247564'}","{'ts': 1612743050821, 'value': '237440'}"
如您所見,信息是每個單元格,例如:
"{'ts': 1612743050821, 'value': '247564'}"
我理解的是另一個 Json .. 有沒有簡單的方法來添加一個名為 timestamp ( ts
) 的列,並且只將值放在這個 json 現在所在的單元格中? 我相信這是正確的方法,我的目標是將 json 中包含的信息轉換為 csv 格式,使其更易於被第三方(數據庫或人工智能算法)使用。 但如果你能想到另一種更方便的方式或格式,我願意改變我最初的想法。 我不得不承認我對這個世界很陌生。
我想過通過 json 並手動進行轉換,但很難關聯具有相同時間戳的測量值。
尼古拉斯
您沒有說出您想要數據的方式,因此下面發布的代碼將其轉換為表格格式,其中每個列用於機器(不確定是否正確)、ts 和值。
import pandas as pd
import json
with open('read.json') as f_input:
data = json.load(f_input)
df = pd.DataFrame.from_dict(data, orient='columns')
df_new = pd.DataFrame(columns=['machine', 'ts', 'value'])
data=[]
for col in df.columns:
for index,row in df[col].iteritems():
ts, value = row.values()
data.append({'machine':col, 'ts':ts, 'value':value})
df_new = df_new.append(data)
df_new.to_csv('out.csv', encoding='utf-8', index=False)
如果您希望列成為時間戳並且機器將最后兩行更改為此
df_new = df_new.append(data).pivot(index='ts', columns='machine', values='value')
df_new.to_csv('out.csv', encoding='utf-8')
pd.DataFrame(df[col].values.tolist())
是從列規范化單級dict
的最快方法,但這個答案顯示了如何處理有問題的列(例如在嘗試.values.tolist()
時導致錯誤)。import pandas as pd
# read the json file
with open('read.json') as f_input:
df = pd.read_json(f_input)
# create a new dataframe for the normalized columns from df
normed_df = pd.DataFrame()
# iterate through each column, normalize it, and append it to normed_df
for col in df.columns:
normed = pd.DataFrame(df[col].values.tolist()) # normalize the column from df
normed['type'] = col # add the original column name as a new column so the associated values can be identified
normed_df = normed_df.append(normed) # append to normed_df
# convert ts to a datetime dtype
normed_df.ts = pd.to_datetime(normed_df.ts, unit='ms')
# reset the index
normed_df = normed_df.reset_index(drop=True)
# save this long form to a csv
normed_df.to_csv('long.csv', index=False)
# display(normed_df)
ts value type
0 2021-02-08 01:00:51.248 0.0 caudal
1 2021-02-08 00:50:50.856 0.0 caudal
2 2021-02-08 00:30:50.898 0.0 caudal
3 2021-02-08 00:20:50.861 0.0 caudal
4 2021-02-08 00:10:50.821 0.0 caudal
5 2021-02-08 01:00:51.248 247564 FreeHeap
6 2021-02-08 00:50:50.856 247564 FreeHeap
7 2021-02-08 00:30:50.898 247564 FreeHeap
8 2021-02-08 00:20:50.861 247564 FreeHeap
9 2021-02-08 00:10:50.821 247564 FreeHeap
10 2021-02-08 01:00:51.248 237440 MinimoFreeHeap
11 2021-02-08 00:50:50.856 237440 MinimoFreeHeap
12 2021-02-08 00:30:50.898 237440 MinimoFreeHeap
13 2021-02-08 00:20:50.861 237440 MinimoFreeHeap
14 2021-02-08 00:10:50.821 237440 MinimoFreeHeap
.pivot
以ts
為索引對齊數據。# pivot normed_df to a wide format
dfp = normed_df.pivot(index='ts', columns='type', values='value')
# display(dfp)
type FreeHeap MinimoFreeHeap caudal
ts
2021-02-08 00:10:50.821 247564 237440 0.0
2021-02-08 00:20:50.861 247564 237440 0.0
2021-02-08 00:30:50.898 247564 237440 0.0
2021-02-08 00:50:50.856 247564 237440 0.0
2021-02-08 01:00:51.248 247564 237440 0.0
# save this wide form to a csv
dfp.reset_index().to_csv('wide.csv', index=False)
最后我找到了解決方案……有一個非常有趣的庫,叫做“ cherrypicker ”。 通過 pandas 中的示例和數據幀,我想出了如何讓它工作。 代碼如下:
import pandas as pd
from cherrypicker import CherryPicker
import json
keys = {'FreeHeap', 'MinimoFreeHeap', 'caudal'} #In the future there will be more keys
with open('read.json') as f_input:
data = json.load(f_input)
picker = CherryPicker(data)
pos = 0
for colum in keys:
flat = picker[colum].flatten().get()
df = pd.DataFrame(flat)
df.columns = ['TimeStamp', colum] #Rename
if(pos == 0):
fin = df
print(fin)
pos = 1
else:
del df['TimeStamp'] #Remove timestamp because it is repeated
fin[colum] = df
print(fin)
fin.to_csv('out.csv', encoding='utf-8', index=False)
我希望它將來對某人有用,我不確定這是否是最簡單的方法,但它對我有用! 問候
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.