how do i avoid creating so many variables as I add columns together? I have certain conditions that need to be met, and each new statement washes the old information out where the condition is not met. So how do i preserve the old value and add in the new?
take this DataFrame
import pandas as pd
import datetime as DT
d = {'case' : pd.Series([1,1,1,1,2]),
'open' : pd.Series([DT.datetime(2014, 3, 2), DT.datetime(2014, 3, 2),DT.datetime(2014, 3, 2),DT.datetime(2014, 3, 2),DT.datetime(2014, 3, 2)]),
'change' : pd.Series([DT.datetime(2014, 3, 8), DT.datetime(2014, 4, 8),DT.datetime(2014, 5, 8),DT.datetime(2014, 6, 8),DT.datetime(2014, 6, 8)]),
'StartEvent' : pd.Series(['Homeless','Homeless','Homeless','Homeless','Jail']),
'ChangeEvent' : pd.Series(['Homeless','Jail','Homeless','Jail','Jail']),
'close' : pd.Series([DT.datetime(2015, 3, 2), DT.datetime(2015, 3, 2),DT.datetime(2015, 3, 2),DT.datetime(2015, 3, 2),DT.datetime(2015, 3, 2)])}
df=pd.DataFrame(d)
this gives me part of the information i need.
df['homeless']=(df.groupby('case')['change'].apply(lambda x: x - x.shift(1) )[(df.ChangeEvent.shift(1)=='Homeless')])/np.timedelta64(1,'D')
df['jail']=(df.groupby('case')['change'].apply(lambda x: x- x.shift(1) )[(df.ChangeEvent.shift(1)=='Jail')])/np.timedelta64(1,'D')
df.homeless=df.homeless.fillna(0)
df.jail=df.jail.fillna(0)
df.loc[df.groupby(['case']).apply(lambda x: x['change'].idxmin()), 'first']=1
df.loc[df.groupby(['case']).apply(lambda x: x['change'].idxmax()), 'last']=1
Ideally i could take the next part and have it land in the same variables 'homeless' 'jail' but whatever i try deletes of the current where the condition is not met
df['homeless2']=(df['homeless']+(df['change']-df['open'])/np.timedelta64(1,'D'))[(df['ChangeEvent']=='Homeless') & (df['first']==1)]
for example, the next line will nan out where the condition is not met. how do i preserve the old value and add in the new.
#df['homeless2']=(df['homeless']+(df['change']-df['open'])/np.timedelta64(1,'D'))[(df['ChangeEvent']=='Homeless') & (df['first']==1)]
df['jail2']=(df['jail']+(df['change']-df['open'])/np.timedelta64(1,'D'))[(df['ChangeEvent']=='Jail') & (df['first']==1)]
df.homeless2=df.homeless2.fillna(0)
df.jail2=df.jail2.fillna(0)
df['homeless3']=(df['homeless']+(df['close']-df['change'])/np.timedelta64(1,'D'))[(df['ChangeEvent']=='Homeless') & (df['last']==1)]
df['jail3']=(df['jail']+(df['close']-df['change'])/np.timedelta64(1,'D'))[(df['ChangeEvent']=='Jail') & (df['last']==1)]
df.homeless3=df.homeless3.fillna(0)
df.jail3=df.jail3.fillna(0)
df['realjail']=df.jail+df.jail2+df.jail3
df['realhomeless']=df.homeless+df.homeless2+df.homeless3
This works, but it is far from efficient. thank you.
The first part of what you are doing; slightly cleaned up
In [51]: df=pd.DataFrame(d)
In [52]: changes = df.groupby('case')['change']
In [53]: df['jail'] = (changes.diff()[df.ChangeEvent.shift(1)=='Jail']/np.timedelta64(1,'D'))
In [54]: df['homeless'] = (changes.diff()[df.ChangeEvent.shift(1)=='Homeless']/np.timedelta64(1,'D'))
In [55]: df['homeless'].fillna(0,inplace=True)
In [56]: df['jail'].fillna(0,inplace=True)
In [57]: df.loc[changes.idxmax(), 'last']=1
In [58]: df.loc[changes.idxmin(), 'first']=1
In [59]: df
Out[59]:
ChangeEvent StartEvent case change close open jail homeless last first
0 Homeless Homeless 1 2014-03-08 2015-03-02 2014-03-02 0 0 NaN 1
1 Jail Homeless 1 2014-04-08 2015-03-02 2014-03-02 0 31 NaN NaN
2 Homeless Homeless 1 2014-05-08 2015-03-02 2014-03-02 30 0 NaN NaN
3 Jail Homeless 1 2014-06-08 2015-03-02 2014-03-02 0 31 1 NaN
4 Jail Jail 2 2014-06-08 2015-03-02 2014-03-02 0 0 1 1
[5 rows x 10 columns]
You don't have to create this is new columns, but IMHO a bit cleaner
In [62]: df['homeless_change'] = df['homeless']+(df['change']-df['open'])/np.timedelta64(1,'D')
This is the key it tells loc which rows to set
In [63]: homeless_mask = (df['ChangeEvent']=='Homeless') & (df['first']==1)
The alignment happens only for the row mask and the column you specify
In [64]: df.loc[homeless_mask,'homeless'] = df['homeless_change']
In [65]: df
Out[65]:
ChangeEvent StartEvent case change close open jail homeless last first homeless_change
0 Homeless Homeless 1 2014-03-08 2015-03-02 2014-03-02 0 6 NaN 1 6
1 Jail Homeless 1 2014-04-08 2015-03-02 2014-03-02 0 31 NaN NaN 68
2 Homeless Homeless 1 2014-05-08 2015-03-02 2014-03-02 30 0 NaN NaN 67
3 Jail Homeless 1 2014-06-08 2015-03-02 2014-03-02 0 31 1 NaN 129
4 Jail Jail 2 2014-06-08 2015-03-02 2014-03-02 0 0 1 1 98
[5 rows x 11 columns]
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.