简体   繁体   English

需要帮助从 json 文件格式化 pandas 数据帧

[英]Need help formatting pandas data frame from json file

Hi I need help formatting a json file that I converted to a pandas dataframe.嗨,我需要帮助格式化 json 文件,我将其转换为 pandas dataframe。

Json looks like Json 看起来像

{
  "test":
    { 
       "1":["test1_a", "test1_b", "test1_c"]
       "2":["test2_a", "test2_b", "test2_c"]
       "3":["test3_a", "test3_b", "test3_c"]
     }
}

And I need this json to be converted to a pandas dataframe and for it to be printed like this:我需要将此 json 转换为 pandas dataframe 并像这样打印:

col1     col2     col3
test1_a  test1_b  test1_c
test2_a  test2_b  test2_c
test3_a  test3_b  test3_c

How would I do this?我该怎么做? I need it to be a pandas dataframe and need to define the column rows.我需要它是 pandas dataframe 并且需要定义列行。

So far I have tried:到目前为止,我已经尝试过:

json_file = open(json_file_path, 'r') 
data = json.load(json_file)
pandasDataframe = pd.Dataframe.from_dict(data)
print(pandasDataframe)

And it prints this, which I don't want:(它打印了这个,我不想要:(

1 ["test1_a", "test1_b", "test1_c"]
2 ["test2_a", "test2_b", "test2_c"]
3 ["test3_a", "test3_b", "test3_c"]

updated: when I do更新:当我这样做的时候

pd.DataFrame(data['test'])

It looks like [not quite what I want, but it's getting there]它看起来像 [不是我想要的,但它正在到达那里]

     1        2        3
0 test1_a   test2_a  test3_a
1 test1_b   test2_b  test3_b
2 test1_c   test2_c  test3_c

Update #2: when I transpose it looks like this:更新#2:当我转置时,它看起来像这样:

        0               2
1 test1_a test1_b test1_c
2 test2_a test2_b test2_c
3 test3_a test3_b test3_c

How would I get rid of the 0 and 2 at the top?我将如何摆脱顶部的 0 和 2 ? And what does it mean?这是什么意思? Also how do I get rid of the 1,2,3 (aka the first column altogether)另外我如何摆脱 1,2,3 (又名第一列)

desired output: the col names (col1, col2, col3) need to be added, but don't know how)所需的 output:需要添加列名称(col1、col2、col3),但不知道如何添加)

col1     col2     col3
test1_a  test1_b  test1_c
test2_a  test2_b  test2_c
test3_a  test3_b  test3_c

IIUC, you need add_prefix IIUC,你需要add_prefix

import pandas as pd

pd.DataFrame(data['test']).add_prefix('col')

      col1     col2     col3
0  test1_a  test2_a  test3_a
1  test1_b  test2_b  test3_b
2  test1_c  test2_c  test3_c

You could try with:您可以尝试:

pd.DataFrame(data['test']).T.rename(columns={0:'col1',1:'col2',2:'col3'})

Output: Output:

      col1     col2     col3
1  test1_a  test1_b  test1_c
2  test2_a  test2_b  test2_c
3  test3_a  test3_b  test3_c

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM