import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
data={'state':[1, 1, 1, 2, 2, 2, 3, 3, 3, 4, 4, 4],
'year':[1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3],
'pop':[11, 22, 0, 33, 44, 32, 45, 66, 34, 12, 32, 0],
'gdp':[123, 341, 554, 654, 245, 665, 332 ,321, 344, 232, 542, 221]}
frame=pd.DataFrame(data)
def treat(group):
if group.ix[group.year==3, 'pop']!=0:
group['Treated']=1
else:
group['Treated']=0
frame.groupby('state').apply(treat)
I am trying to create a variable frame['Treated']
according to some condition. if ('year'==3) and ('pop'!=0)
- I think the 'state' is in the Treated group (so I created a variable called 'Treated' ).
Unfortunately I end up with an error:
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
What's wrong with my code? Do you know how I could solve this problem?
Reedit Thank for your kind answer, and I'm sorry for having not described my problem clearly.
I'm trying to describe my problem again. For state 1 , the pop is 0 in the year 3 ,so state 1 is not in the treated group (as following shows, frame['Treated']=0 for state 1 in every year) For state 2, the pop is not equal to 0 in the year 3, so state 2 is in the treated group (as following shows, frame['Treated']=1 for state 2 in every year) other states are processed for similar reason. The final result is like the following.
state year pop gdp Treated
0 1 1 11 123 0
1 1 2 22 341 0
2 1 3 0 554 0
3 2 1 33 654 1
4 2 2 44 245 1
5 2 3 32 665 1
6 3 1 45 332 1
7 3 2 66 321 1
8 3 3 34 344 1
9 4 1 12 232 0
10 4 2 32 542 0
11 4 3 0 221 0
groupby
is not needed here , you just need np.where
frame['Treated']=np.where((frame.year==3)&(frame.pop!=0),1,0)
frame
Out[429]:
gdp pop state year Treated
0 123 11 1 1 0
1 341 22 1 2 0
2 554 0 1 3 1
3 654 33 2 1 0
4 245 44 2 2 0
5 665 32 2 3 1
6 332 45 3 1 0
7 321 66 3 2 0
8 344 34 3 3 1
9 232 12 4 1 0
10 542 32 4 2 0
11 221 0 4 3 1
An alternative to np.where
would be to convert the appropriate boolean mask to integer type.
frame['Treated'] = (frame.year.eq(3) & frame['pop'].ne(0)).astype(int)
Your current code does not work because
group.ix[group.year==3, 'pop']!=0
leaves you with a Pandas Series still, which you can't safely use in an if statement. In any case, using apply
like this is bad form when you can solve your issue with a boolean mask.
Using pandas.DataFrame.assign
and pandas.DataFrame.eval
frame.assign(Treated=frame.eval('pop != 0 & year == 3') * 1)
gdp pop state year Treated
0 123 11 1 1 0
1 341 22 1 2 0
2 554 0 1 3 0
3 654 33 2 1 0
4 245 44 2 2 0
5 665 32 2 3 1
6 332 45 3 1 0
7 321 66 3 2 0
8 344 34 3 3 1
9 232 12 4 1 0
10 542 32 4 2 0
11 221 0 4 3 0
I multiply by one to force an integer. It is shorter code but not as efficient as @miradulo's astype(int)
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.