I've got a pandas DataFrame. In this DataFrame I want to modify several columns of some rows. These are the approaches I've attempted.
df[['finalA', 'finalB']] = df[['A', 'B']]
exceptions = df.loc[df.normal == False]
Which works like a charm, but now I want to set the exceptions:
df.loc[exceptions.index, ['finalA', 'finalB']] = \
df.loc[exceptions.index, ['A_except', 'B_except']]
Which doesn't work. So I tried using .ix
from this answer .
df.ix[exceptions.index, ['finalA', 'finalB']] = \
df.ix[exceptions.index, ['A_except', 'B_except']]
Which doesn't work either. Both methods give me NaN
in both finalA
and finalB
for the exceptional rows.
The only way that seems to work is doing it one column at a time:
df.ix[exceptions.index, 'finalA'] = \
df.ix[exceptions.index, 'A_except']
df.ix[exceptions.index, 'finalB'] = \
df.ix[exceptions.index, 'B_except']
What's going on here in pandas? How do I avoid setting the values to the copy that is apparently made by selecting multiple columns? Is there a way to avoid this kind of code repetition?
Some more musings: It doesn't actually set the values to a copy of the dataframe, it sets the values to NaN. It actually overwrites them to a new value.
A A_except B B_except normal finalA finalB
0 1 0 5 0 True 1.0 5.0
1 2 0 6 0 True 2.0 6.0
2 3 9 7 10 False NaN NaN
3 4 9 8 10 False NaN NaN
Result:
A A_except B B_except normal finalA finalB
0 1 0 5 0 True 1 5
1 2 0 6 0 True 2 6
2 3 9 7 10 False 9 10
3 4 9 8 10 False 9 10
Expected result:
A A_except B B_except normal finalA finalB 0 1 0 5 0 True 1 5 1 2 0 6 0 True 2 6 2 3 9 7 10 False 9 10 3 4 9 8 10 False 9 10
You can rename column names for align:
d = {'A_except':'finalA', 'B_except':'finalB'}
df.loc[exceptions.index, ['finalA', 'finalB']] = \
df.loc[exceptions.index, ['A_except', 'B_except']].rename(columns=d)
print (df)
A A_except B B_except normal finalA finalB
0 1 0 5 0 True 1 5
1 2 0 6 0 True 2 6
2 3 9 7 10 False 9 10
3 4 9 8 10 False 9 10
Another solution is convert output to numpy array
, but columns dont align:
df.loc[exceptions.index, ['finalA', 'finalB']] = \
df.loc[exceptions.index, ['A_except', 'B_except']].values
print (df)
A A_except B B_except normal finalA finalB
0 1 0 5 0 True 1 5
1 2 0 6 0 True 2 6
2 3 9 7 10 False 9 10
3 4 9 8 10 False 9 10
If you view both sides of the equations, you will notice that the columns differ. Pandas takes the labels of the columns into account, and since they don't match, wont insert the value.
It works for a single column because then you are extracting a Series, and then the column label no longer applies.
A quick solution would be the simply strip the DataFrame to a bare array, then both the loc
and ix
method work:
df.loc[exceptions.index, ['finalA', 'finalB']] =
df.loc[exceptions.index, ['A_except', 'B_except']].values
But keep in mind that doing this will eliminate Pandas attempt to match column and index labels, its basically a 'hard' insert. So that makes you as a user responsible for the proper alignment. Which in this case is not a problem, but something to be aware of in general.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.