简体   繁体   中英

pandas reindexing error while using loc to create a new data frame

There are many questions on re indexing, I tried the solutions but they dint work for my code, may be i got something wrong, I have a data set with two variables patnum(ID), vrddat(Date) and I'm using below code to get the data frame after applying certain conditions.

data_3 = data_2.loc[(((data_2.groupby('patnum').first()['vrddat']> datetime.date(2012,1,1)) & 
     (data_2.groupby('patnum').first()['vrddat']> datetime.date(2012,3,31)))),['patnum','vrddat','drug']].reset_index(drop = True)

Above code is throwing below error.

IndexingError

IndexingError: Unalignable boolean Series key provided

How do I get a new data frame having all the variables as input data after applying conditions, In the above code conditions work but when i'm using loc to get a new data frame with all the variables it's throwing Indexing error, I used reset_index as well but it dint work.

Thanks.

There is problem you want use boolean indexing in DataFrame data_2 by mask created from Series s , so need isin for check values in column vrddat by vals :

data_2 = pd.DataFrame({'patnum':[1,2,3,3,1],
                   'vrddat':pd.date_range('2012-01-10', periods=5, freq='1m'),
                   'drug':[7,8,9,7,5],
                   'zzz ':[1,3,5,6,7]})

print (data_2)
   drug  patnum     vrddat  zzz 
0     7       1 2012-01-31     1
1     8       2 2012-02-29     3
2     9       3 2012-03-31     5
3     7       3 2012-04-30     6
4     5       1 2012-05-31     7

s = data_2.groupby('patnum')['vrddat'].first()
print (s)
patnum
1   2012-01-31
2   2012-02-29
3   2012-03-31
Name: vrddat, dtype: datetime64[ns]

mask = (s > datetime.date(2012,1,1)) & (s < datetime.date(2012,3,31))
print (mask)
patnum
1     True
2     True
3    False
Name: vrddat, dtype: bool

vals = s[mask]
print (vals)
patnum
1   2012-01-31
2   2012-02-29
Name: vrddat, dtype: datetime64[ns]

data_3 = data_2.loc[data_2['vrddat'].isin(vals), ['patnum','vrddat','drug']]
               .reset_index(drop = True)
print (data_3)
   patnum     vrddat  drug
0       1 2012-01-31     7
1       2 2012-02-29     8

Another faster solution for s is drop_duplicates :

s = data_2.drop_duplicates(['patnum'])['vrddat']
print (s)
0   2012-01-31
1   2012-02-29
2   2012-03-31
Name: vrddat, dtype: datetime64[ns]

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM