I have the following dataframe called Utilidad
Argentina Bolivia Chile España Uruguay 2004 3 6 1 3 2 2005 5 1 4 1 5
And I calculate the difference between 2004 and 2005 using
Utilidad.ix['resta']=Utilidad.ix[2005]-Utilidad.ix[2004]
Now I'm trying to create two additional rows, one with the result of the difference when is positive and the other one with the negatives, something like this
Argentina Bolivia Chile España Uruguay 2004 3 6 1 3 2 2005 5 1 4 1 5 resta 2 -5 3 -2 3 positive 2 0 3 0 3 negative 0 -5 0 -2 0
The only I have managed to do is to have an additional column which tells me wheter "resta" is positive or not, using
Utilidad.ix['boleano'][Utilidad.ix['resta']>0]
Can someone help me to create this two additional rows?
Thanks
You can use numpy.where
df.ix['positive'] = np.where(df.ix['resta'] > 0, df.ix['resta'], 0)
df.ix['negative'] = np.where(df.ix['resta'] < 0, df.ix['resta'], 0)
numpy.clip
will be handy here, or just calculate it .
In [35]:
Utilidad.ix['positive']=np.clip(Utilidad.ix['resta'], 0, np.inf)
Utilidad.ix['negative']=np.clip(Utilidad.ix['resta'], -np.inf, 0)
#or
Utilidad.ix['positive']=(Utilidad.ix['resta']+Utilidad.ix['resta'].abs())/2
Utilidad.ix['negative']=(Utilidad.ix['resta']-Utilidad.ix['resta'].abs())/2
print Utilidad
Argentina Bolivia Chile España Uruguay
id
2004 3 6 1 3 2
2005 5 1 4 1 5
resta 2 -5 3 -2 3
positive 2 0 3 0 3
negative 0 -5 0 -2 0
[5 rows x 5 columns]
Some speed comparisons:
%timeit (Utilidad.ix['resta']-Utilidad.ix['resta'].abs())/2
1000 loops, best of 3: 627 µs per loop
In [36]:
%timeit Utilidad.ix['positive'] = np.where(Utilidad.ix['resta'] > 0, Utilidad.ix['resta'], 0)
1000 loops, best of 3: 647 µs per loop
In [38]:
%timeit Utilidad.ix['positive']=np.clip(Utilidad.ix['resta'], 0, 100)
100 loops, best of 3: 2.6 ms per loop
In [45]:
%timeit Utilidad.ix['resta'].clip_upper(0)
1000 loops, best of 3: 1.32 ms per loop
The observation to make here is that negative is the minimum of 0 and the row:
In [11]: np.minimum(df.loc['resta'], 0) # negative
Out[11]:
Argentina 0
Bolivia -5
Chile 0
España -2
Uruguay 0
Name: resta, dtype: int64
In [12]: np.maximum(df.loc['resta'], 0) # positive
Out[12]:
Argentina 2
Bolivia 0
Chile 3
España 0
Uruguay 3
Name: resta, dtype: int64
Note: If you are concerned about speed then it would make sense to transpose the DataFrame, since appending columns is much cheaper than appending rows.
You can append a row using loc:
df.loc['negative'] = np.minimum(df.loc['resta'], 0)
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.