簡體   English   中英

將嵌套的 Json 轉換為 CSV Python

[英]Convert Nested Json to CSV Python

我正在嘗試將復雜的 json (嵌套格式)轉換為csv

{
"caudal": [
{"ts": 1612746051248, "value": "0.0"}, 
{"ts": 1612745450856, "value": "0.0"}, 
{"ts": 1612744250898, "value": "0.0"}, 
{"ts": 1612743650861, "value": "0.0"}, 
{"ts": 1612743050821, "value": "0.0"} 
], 
"FreeHeap": [
{"ts": 1612746051248, "value": "247564"}, 
{"ts": 1612745450856, "value": "247564"}, 
{"ts": 1612744250898, "value": "247564"}, 
{"ts": 1612743650861, "value": "247564"}, 
{"ts": 1612743050821, "value": "247564"} 
], 
"MinimoFreeHeap": [
{"ts": 1612746051248, "value": "237440"}, 
{"ts": 1612745450856, "value": "237440"}, 
{"ts": 1612744250898, "value": "237440"}, 
{"ts": 1612743650861, "value": "237440"}, 
{"ts": 1612743050821, "value": "237440"} 
]
} 

我的程序必須處理的 jsons 包含更多記錄,但我將其縮小以簡化分析。我嘗試使用 pandas 庫,如下所示:

import pandas as pd

with open('read.json') as f_input:
    df = pd.read_json(f_input)

df.to_csv('out.csv', encoding='utf-8', index=False)

我得到以下結果:

caudal,FreeHeap,MinimoFreeHeap
"{'ts': 1612746051248, 'value': '0.0'}","{'ts': 1612746051248, 'value': '247564'}","{'ts': 1612746051248, 'value': '237440'}"
"{'ts': 1612745450856, 'value': '0.0'}","{'ts': 1612745450856, 'value': '247564'}","{'ts': 1612745450856, 'value': '237440'}"
"{'ts': 1612744250898, 'value': '0.0'}","{'ts': 1612744250898, 'value': '247564'}","{'ts': 1612744250898, 'value': '237440'}"
"{'ts': 1612743650861, 'value': '0.0'}","{'ts': 1612743650861, 'value': '247564'}","{'ts': 1612743650861, 'value': '237440'}"
"{'ts': 1612743050821, 'value': '0.0'}","{'ts': 1612743050821, 'value': '247564'}","{'ts': 1612743050821, 'value': '237440'}"

如您所見,信息是每個單元格,例如:

"{'ts': 1612743050821, 'value': '247564'}"

我理解的是另一個 Json .. 有沒有簡單的方法來添加一個名為 timestamp ( ts ) 的列,並且只將值放在這個 json 現在所在的單元格中? 我相信這是正確的方法,我的目標是將 json 中包含的信息轉換為 csv 格式,使其更易於被第三方(數據庫或人工智能算法)使用。 但如果你能想到另一種更方便的方式或格式,我願意改變我最初的想法。 我不得不承認我對這個世界很陌生。

我想過通過 json 並手動進行轉換,但很難關聯具有相同時間戳的測量值。

尼古拉斯

您沒有說出您想要數據的方式,因此下面發布的代碼將其轉換為表格格式,其中每個列用於機器(不確定是否正確)、ts 和值。

import pandas as pd
import json

with open('read.json') as f_input:
    data = json.load(f_input)

df = pd.DataFrame.from_dict(data, orient='columns')

df_new = pd.DataFrame(columns=['machine', 'ts', 'value'])
data=[]

for col in df.columns:
  for index,row in df[col].iteritems():
    ts, value = row.values()
    data.append({'machine':col, 'ts':ts, 'value':value})
    
df_new = df_new.append(data)

df_new.to_csv('out.csv', encoding='utf-8', index=False)

如果您希望列成為時間戳並且機器將最后兩行更改為此

df_new = df_new.append(data).pivot(index='ts', columns='machine', values='value')

df_new.to_csv('out.csv', encoding='utf-8')
  • 根據這個問題時序分析pd.DataFrame(df[col].values.tolist())是從列規范化單級dict的最快方法,但這個答案顯示了如何處理有問題的列(例如在嘗試.values.tolist()時導致錯誤)。
import pandas as pd

# read the json file
with open('read.json') as f_input:
    df = pd.read_json(f_input)

# create a new dataframe for the normalized columns from df
normed_df = pd.DataFrame()

# iterate through each column, normalize it, and append it to normed_df
for col in df.columns:
    normed = pd.DataFrame(df[col].values.tolist())  # normalize the column from df
    normed['type'] = col  # add the original column name as a new column so the associated values can be identified
    normed_df = normed_df.append(normed)  # append to normed_df

# convert ts to a datetime dtype
normed_df.ts = pd.to_datetime(normed_df.ts, unit='ms')

# reset the index
normed_df = normed_df.reset_index(drop=True)

# save this long form to a csv
normed_df.to_csv('long.csv', index=False)

# display(normed_df)
                        ts   value            type
0  2021-02-08 01:00:51.248     0.0          caudal
1  2021-02-08 00:50:50.856     0.0          caudal
2  2021-02-08 00:30:50.898     0.0          caudal
3  2021-02-08 00:20:50.861     0.0          caudal
4  2021-02-08 00:10:50.821     0.0          caudal
5  2021-02-08 01:00:51.248  247564        FreeHeap
6  2021-02-08 00:50:50.856  247564        FreeHeap
7  2021-02-08 00:30:50.898  247564        FreeHeap
8  2021-02-08 00:20:50.861  247564        FreeHeap
9  2021-02-08 00:10:50.821  247564        FreeHeap
10 2021-02-08 01:00:51.248  237440  MinimoFreeHeap
11 2021-02-08 00:50:50.856  237440  MinimoFreeHeap
12 2021-02-08 00:30:50.898  237440  MinimoFreeHeap
13 2021-02-08 00:20:50.861  237440  MinimoFreeHeap
14 2021-02-08 00:10:50.821  237440  MinimoFreeHeap
  • 使用.pivotts為索引對齊數據。
# pivot normed_df to a wide format
dfp = normed_df.pivot(index='ts', columns='type', values='value')

# display(dfp)
type                    FreeHeap MinimoFreeHeap caudal
ts                                                    
2021-02-08 00:10:50.821   247564         237440    0.0
2021-02-08 00:20:50.861   247564         237440    0.0
2021-02-08 00:30:50.898   247564         237440    0.0
2021-02-08 00:50:50.856   247564         237440    0.0
2021-02-08 01:00:51.248   247564         237440    0.0

# save this wide form to a csv
dfp.reset_index().to_csv('wide.csv', index=False)

最后我找到了解決方案……有一個非常有趣的庫,叫做“ cherrypicker ”。 通過 pandas 中的示例和數據幀,我想出了如何讓它工作。 代碼如下:

import pandas as pd
from cherrypicker import CherryPicker
import json

keys = {'FreeHeap', 'MinimoFreeHeap', 'caudal'} #In the future there will be more keys

with open('read.json') as f_input:
     data = json.load(f_input)

     
     
picker = CherryPicker(data)
pos = 0
for colum in keys:
    flat = picker[colum].flatten().get()
    df = pd.DataFrame(flat)
    df.columns = ['TimeStamp', colum]  #Rename
    if(pos == 0):
        fin = df
        print(fin)
        pos = 1
    else:
        del df['TimeStamp']            #Remove timestamp because it is repeated
        fin[colum] = df     
        print(fin)

fin.to_csv('out.csv', encoding='utf-8', index=False)

我希望它將來對某人有用,我不確定這是否是最簡單的方法,但它對我有用! 問候

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM