简体   繁体   English

在使用熊猫iterrows()时追加新行?

[英]Append new row when using pandas iterrows()?

I have the following code where I create df['var'2] and alter df['var1'] . 我有以下代码,在其中创建df['var'2]并更改df['var1'] After performing these changes, I would like to append the newrow (with df['var'2] ) to the dataframe while keeping the original (though now altered) row (which has df['var1'] ). 执行newrow这些更改之后,我想将newrow (带有df['var'2] )附加到数据帧,同时保留原始(尽管现在已更改)行(具有df['var1'] )。

for i, row in df.iterrows():
    while row['var1'] > 30: 
        newrow = row
        newrow['var2'] = 30
        row['var1'] = row['var1']-30
        df.append(newrow)

I understand that when using iterrows() , row variables are copies instead of views which is why the changes are not being updated in the original dataframe. 我知道使用iterrows() ,行变量是副本而不是视图,这就是为什么更改未在原始数据帧中更新的原因。 So, how would I alter this code to actually append newrow to the dataframe? 因此,我将如何更改此代码以实际将newrow追加到数据框?

Thank you! 谢谢!

It is generally inefficient to append rows to a dataframe in a loop because a new copy is returned. 在循环中将行追加到数据框通常效率不高,因为会返回新副本。 You are better off storing the intermediate results in a list and then concatenating everything together at the end. 最好将中间结果存储在列表中,然后将所有内容最后串联在一起。

Using row.loc['var1'] = row['var1'] - 30 will make an inplace change to the original dataframe. 使用row.loc['var1'] = row['var1'] - 30将对原始数据帧进行就地更改。

np.random.seed(0)
df = pd.DataFrame(np.random.randn(5, 2) * 100, columns=['var1', 'var2'])

>>> df
         var1        var2
0  176.405235   40.015721
1   97.873798  224.089320
2  186.755799  -97.727788
3   95.008842  -15.135721
4  -10.321885   41.059850

new_rows = []
for i, row in df.iterrows():
    while row['var1'] > 30: 
        newrow = row
        newrow['var2'] = 30
        row.loc['var1'] = row['var1'] - 30
        new_rows.append(newrow.values)
    df_new = df.append(pd.DataFrame(new_rows, columns=df.columns)).reset_index()

>>> df
    var1      var2
0  26.405235  30.00000
1   7.873798  30.00000
2   6.755799  30.00000
3   5.008842  30.00000
4 -10.321885  41.05985

>>> df_new
         var1      var2
0   26.405235  30.00000
1    7.873798  30.00000
2    6.755799  30.00000
3    5.008842  30.00000
4  -10.321885  41.05985
5   26.405235  30.00000
6   26.405235  30.00000
7   26.405235  30.00000
8   26.405235  30.00000
9   26.405235  30.00000
10   7.873798  30.00000
11   7.873798  30.00000
12   7.873798  30.00000
13   6.755799  30.00000
14   6.755799  30.00000
15   6.755799  30.00000
16   6.755799  30.00000
17   6.755799  30.00000
18   6.755799  30.00000
19   5.008842  30.00000
20   5.008842  30.00000
21   5.008842  30.00000

EDIT (per request below): 编辑 (根据下面的请求):

new_rows = []
for i, row in df.iterrows():
    while row['var1'] > 30: 
        row.loc['var1'] = var1 = row['var1'] - 30
        new_rows.append([var1, 30])
    df_new = df.append(pd.DataFrame(new_rows, columns=df.columns)).reset_index()

>>> df_new
    index        var1        var2
0       0   26.405235   40.015721
1       1    7.873798  224.089320
2       2    6.755799  -97.727788
3       3    5.008842  -15.135721
4       4  -10.321885   41.059850
5       0  146.405235   30.000000
6       1  116.405235   30.000000
7       2   86.405235   30.000000
8       3   56.405235   30.000000
9       4   26.405235   30.000000
10      5   67.873798   30.000000
11      6   37.873798   30.000000
12      7    7.873798   30.000000
13      8  156.755799   30.000000
14      9  126.755799   30.000000
15     10   96.755799   30.000000
16     11   66.755799   30.000000
17     12   36.755799   30.000000
18     13    6.755799   30.000000
19     14   65.008842   30.000000
20     15   35.008842   30.000000
21     16    5.008842   30.000000

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM