I have to use survey data from ipums to get the average number of people who are unemployed in two successive periods. I wrote a function that uses an index and a dataframe as input,
def u1(x,df):
if df.loc[x]['LABFORCE']==2 and df.loc[x]['CPSIDP']==df.loc[x+1]['CPSIDP']:
if df.loc[x]['EMPSTAT']==21 or df.loc[x]['EMPSTAT']==22:
return True
else:
return False
where x
is the index and df
is the dataframe. CPSIDP
identifies the survey respondent, LABFORCE
checks the respondent is in the labor force and EMPSTAT
is what I need to use to check the employment status of the respondent.
And then I planned to use apply
as
result= df.apply(u1, axis=1)
It is not clear what arguments I should pass in my function (and please let me know if this approach is just philosophically wrong). Passing a number or a variable for the index gives me a 'bool' object is not callable error.
The smallest dataframe subset that generates the error (left most column is the number of the observation, it is the x
I need to pass through u1
):
YEAR MONTH CPSIDP EMPSTAT LABFORCE
15285896 2018 7 20180707096701 10 2
15285926 2018 7 20180707098301 10 2
15285927 2018 7 20180707098302 10 2
15285928 2018 7 20180707098303 0 0
15285929 2018 7 20180707098304 0 0
15285930 2018 7 20180707098305 10 2
15286095 2018 7 20180707108203 21 2
IIUC it would be more efficient to create a boolean
Series
using the logic from your function.
Here &
is the AND
operator.
result = (df['LABFORCE'].eq(2) &
df['CPSIDP'].eq(df['CPSIDP'].shift()) &
df['EMPSTAT'].isin([21,22]))
result
15285896 False
15285926 False
15285927 False
15285928 False
15285929 False
15285930 False
15286095 False
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.