简体   繁体   中英

Altering one row at a time in a Pandas dataframe

I'm attempting to write a for loop that will iterate over a subset of the indices in a dataframe, with each loop returning a dataframe with only one row altered.

Here's some dummy code to demonstrate what I mean:

# Two columns of random numbers
df = pd.DataFrame(np.random.randn(10,2),columns=list('ab'))
# The index values where row 'a' > 0
indices = df.loc[df['a'] > 0].index

This is how I'm trying to do it:

for index in indices:
    dummy = df
    dummy.loc[index,'a'] = 'Hello'
    dummy.loc[index,'b'] = 'World'
    print(dummy)

Which returns:

         a         b
0     -1.30278  0.592978
1        Hello     World
2    0.0113196  0.441662
3      1.59222 -0.152032
4    -0.293761 -0.519106
5    -0.402177   1.27412
6      1.24692 -0.203043
7     0.232682  -1.29515
8     -1.03781   0.89598
9  0.000474012  0.572173
         a         b
0     -1.30278  0.592978
1        Hello     World
2        Hello     World
3      1.59222 -0.152032
4    -0.293761 -0.519106
5    -0.402177   1.27412
6      1.24692 -0.203043
7     0.232682  -1.29515
8     -1.03781   0.89598
9  0.000474012  0.572173
         a         b
0     -1.30278  0.592978
1        Hello     World
2        Hello     World
3        Hello     World
4    -0.293761 -0.519106
5    -0.402177   1.27412
6      1.24692 -0.203043
7     0.232682  -1.29515
8     -1.03781   0.89598
9  0.000474012  0.572173

etc...

I'm trying to reset the a and b values with each iteration on the line dummy = df , but it's not working in the way I'd expect.

But what I'd like it to produce is:

         a         b
0     -1.30278  0.592978
1        Hello     World
2    0.0113196  0.441662
3      1.59222 -0.152032
4    -0.293761 -0.519106
5    -0.402177   1.27412
6      1.24692 -0.203043
7     0.232682  -1.29515
8     -1.03781   0.89598
9  0.000474012  0.572173
         a         b
0     -1.30278  0.592978
1      0.74578  0.482945
2        Hello     World
3      1.59222 -0.152032
4    -0.293761 -0.519106
5    -0.402177   1.27412
6      1.24692 -0.203043
7     0.232682  -1.29515
8     -1.03781   0.89598
9  0.000474012  0.572173
         a         b
0     -1.30278  0.592978
1      0.74578  0.482945
2      0.01131  0.441662
3        Hello     World
4    -0.293761 -0.519106
5    -0.402177   1.27412
6      1.24692 -0.203043
7     0.232682  -1.29515
8     -1.03781   0.89598
9  0.000474012  0.572173

etc...

Any help would be greatly appreciated!

You should add .copy() in your loop

for key,index in enumerate(indices):
    dummy = df.copy()
    dummy.loc[index,'a'] = 'Hello'
    dummy.loc[index,'b'] = 'World'
    print(dummy)

You are probably expecting dummy = df to make a copy of df . dummy actually points to the same underlying object as df , so any changes made to dummy are made to df as well. You could fix this by copying df , but an easier and more efficient way it is be to save the original values before printing and then restore them after printing.

for index in indices: 
    orig_values = df.loc[index, ['a', 'b']] 
    df.loc[index, ['a', 'b']] = ['Hello', 'World'] 
    print(df) 
    df.loc[index, ['a', 'b']] = orig_values            

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM