简体   繁体   中英

Modifying multiple columns in a subset of rows in pandas DataFrame

I've got a pandas DataFrame. In this DataFrame I want to modify several columns of some rows. These are the approaches I've attempted.

df[['finalA', 'finalB']] = df[['A', 'B']]
exceptions = df.loc[df.normal == False]

Which works like a charm, but now I want to set the exceptions:

df.loc[exceptions.index, ['finalA', 'finalB']] = \
  df.loc[exceptions.index, ['A_except', 'B_except']]

Which doesn't work. So I tried using .ix from this answer .

df.ix[exceptions.index, ['finalA', 'finalB']] = \
  df.ix[exceptions.index, ['A_except', 'B_except']]

Which doesn't work either. Both methods give me NaN in both finalA and finalB for the exceptional rows.

The only way that seems to work is doing it one column at a time:

df.ix[exceptions.index, 'finalA'] = \
  df.ix[exceptions.index, 'A_except']
df.ix[exceptions.index, 'finalB'] = \
  df.ix[exceptions.index, 'B_except']

What's going on here in pandas? How do I avoid setting the values to the copy that is apparently made by selecting multiple columns? Is there a way to avoid this kind of code repetition?

Some more musings: It doesn't actually set the values to a copy of the dataframe, it sets the values to NaN. It actually overwrites them to a new value.


Sample dataframe:

    A   A_except    B   B_except    normal  finalA  finalB
0   1   0           5   0           True    1.0     5.0
1   2   0           6   0           True    2.0     6.0
2   3   9           7   10          False   NaN     NaN
3   4   9           8   10          False   NaN     NaN

Result:

    A   A_except    B   B_except    normal  finalA  finalB
0   1   0           5   0           True    1       5
1   2   0           6   0           True    2       6
2   3   9           7   10          False   9       10
3   4   9           8   10          False   9       10

Expected result:

  A A_except B B_except normal finalA finalB 0 1 0 5 0 True 1 5 1 2 0 6 0 True 2 6 2 3 9 7 10 False 9 10 3 4 9 8 10 False 9 10 

You can rename column names for align:

d = {'A_except':'finalA', 'B_except':'finalB'}
df.loc[exceptions.index, ['finalA', 'finalB']] = \
  df.loc[exceptions.index, ['A_except', 'B_except']].rename(columns=d)

print (df)
   A  A_except  B  B_except normal  finalA  finalB
0  1         0  5         0   True       1       5
1  2         0  6         0   True       2       6
2  3         9  7        10  False       9      10
3  4         9  8        10  False       9      10

Another solution is convert output to numpy array , but columns dont align:

df.loc[exceptions.index, ['finalA', 'finalB']] = \
  df.loc[exceptions.index, ['A_except', 'B_except']].values

print (df)
   A  A_except  B  B_except normal  finalA  finalB
0  1         0  5         0   True       1       5
1  2         0  6         0   True       2       6
2  3         9  7        10  False       9      10
3  4         9  8        10  False       9      10

If you view both sides of the equations, you will notice that the columns differ. Pandas takes the labels of the columns into account, and since they don't match, wont insert the value.

It works for a single column because then you are extracting a Series, and then the column label no longer applies.

A quick solution would be the simply strip the DataFrame to a bare array, then both the loc and ix method work:

df.loc[exceptions.index, ['finalA', 'finalB']] = 
  df.loc[exceptions.index, ['A_except', 'B_except']].values

But keep in mind that doing this will eliminate Pandas attempt to match column and index labels, its basically a 'hard' insert. So that makes you as a user responsible for the proper alignment. Which in this case is not a problem, but something to be aware of in general.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM