简体   繁体   English

第一次迭代后,熊猫加入不在 for 循环中工作

[英]pandas join not working in for loop after first iteration

I am trying to explode a nested list inside a dict by the help of pandas.我试图在大熊猫的帮助下分解字典中的嵌套列表。 In a loop I join the list with every row.在循环中,我将每一行都加入列表。 Strangely in the second iteration the join seems not to work properly.奇怪的是,在第二次迭代中,连接似乎无法正常工作。 Perhaps there is something fundamental I do not understand about pandas, but I can't figure out why the join in the iteration only works on the first iteration and in the following ones the join does not work, the end the result looks like this:也许有一些我不了解 Pandas 的基本原理,但我无法弄清楚为什么迭代中的连接仅适用于第一次迭代,而在接下来的迭代中连接不起作用,最终结果如下所示:

     key  amount   id  key_r   code   name  key_l
0.0    0    12.0  1.0    0.0    NaN    NaN    NaN
1.0    0    23.0  NaN    0.0    NaN    NaN    NaN
NaN    1     NaN  NaN    NaN  test2  test2    0.0

instead of from the first iteration (strangely the key becomes 1 in the first operation):而不是从第一次迭代开始(奇怪的是,在第一次操作中键变成了 1):

   key  amount   id  key_r   code   name  key_l
0    0      12  1.0      0  test1  test1      0
1    0      23  NaN      0  test1  test1      0

Code:代码:

data = [
{
    "code": "test1",
    "name": "test1",
    "sub_list": [
        {"amount": 10, "id": 2},
        {"amount": 20},
    ],
},
{
    "code": "test2",
    "name": "test2",
    "sub_list": [
        {"amount": 12, "id": 1},
        {"amount": 23},
    ],
}

]
data_df = pd.DataFrame(data)
for ix, row in data_df.iterrows():
    sub_list_df = pd.DataFrame(row['sub_list'])
    row_df = row.to_frame().transpose()
    main_df = row_df.loc[:, row_df.columns != 'sub_list']
    main_df['key'] = 0
    sub_list_df['key'] = 0
    print(main_df)
    print(sub_list_df)
    tmp_df = sub_list_df.join(main_df, on=['key'], how="outer", lsuffix="_r", rsuffix="_l")
    print(tmp_df)

Any advice?有什么建议吗?

Here's a simpler way to do without using explicit for loop:这是一种更简单的方法,无需使用显式 for 循环:

# explode the dict
f = data_df.explode('sub_list')

# convert exploded dict into separate columns
f = pd.concat([f, f['sub_list'].apply(pd.Series)], axis=1).drop('sub_list', axis=1)

print(f)

    code   name  amount   id
0  test1  test1    10.0  2.0
0  test1  test1    20.0  NaN
1  test2  test2    12.0  1.0
1  test2  test2    23.0  NaN

The problem is that value 1 is not coming from your dataframe column "key".问题是值 1 不是来自您的数据框列“键”。 Instead it takes the index values which are 0 and 1 since you have two rows.相反,它采用 0 和 1 的索引值,因为您有两行。 one of the solutions to solve this is to set the key column as index in both dataframes.解决此问题的解决方案之一是将键列设置为两个数据帧中的索引。 for example:例如:

main_df['_key'] = 0
sub_list_df['_key'] = 0
tmp_df = sub_list_df.set_index('_key').join(main_df.set_index('_key'), on='_key', how="outer")

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM