简体   繁体   中英

iterate over unique values in PANDAS

I have a dataset in the following format:

Patient  Date       colA  colB
1        1/3/2015   .     5
1        2/5/2015   3     10
1        3/5/2016   8     .
2        4/5/2014   2     .
2        etc

I am trying to define a function in PANDAS which treats unique patients as an item and iterates over these unique patient items to keep only to most recent observation per column (replacing all other values with missing or null). For example: for patient 1, the output would entail -

Patient  Date       colA  colB
1        1/3/2015   .     .
1        2/5/2015   .     10
1        3/5/2016   8     .

I understand that I can use something like the following with .apply(), but this does not account for duplicate patient IDs...

def getrecentobs():
    for i in df['Patient']:
        etc

Any help or direction is much appreciated.

I think you can use to_numeric for convert values . to NaN , then create mask with groupby and rank and last apply mask :

print df
   Patient      Date colA colB
0        1  1/3/2015    .    5
1        1  2/5/2015    3   10
2        1  3/5/2016    8    .
3        2  4/5/2014    2    .
4        2  5/5/2014    4    .

df['colA'] = pd.to_numeric(df['colA'], errors='coerce')
df['colB'] = pd.to_numeric(df['colB'], errors='coerce')
print df
   Patient      Date  colA  colB
0        1  1/3/2015   NaN     5
1        1  2/5/2015     3    10
2        1  3/5/2016     8   NaN
3        2  4/5/2014     2   NaN
4        2  5/5/2014     4   NaN
print df.groupby('Patient')[['colA','colB']].rank(method='max', ascending=False)
   colA  colB
0   NaN     2
1     2     1
2     1   NaN
3     2   NaN
4     1   NaN

mask = df.groupby('Patient')[['colA','colB']].rank(method='max', ascending=False) == 1
print mask
    colA   colB
0  False  False
1  False   True
2   True  False
3  False  False
4   True  False

df[['colA','colB']] = df[['colA','colB']][mask]
print df
   Patient      Date  colA  colB
0        1  1/3/2015   NaN   NaN
1        1  2/5/2015   NaN    10
2        1  3/5/2016     8   NaN
3        2  4/5/2014   NaN   NaN
4        2  5/5/2014     4   NaN

There is a function in pandas called last which can be used with groupby to give you the last values for a given groupby. I'm not sure why you require the blank rows but if you need them you can join the groupby back on the original data frame. Sorry the sort is there as the date was not sorted in my sample data. Hope that helps.

Example:

DataFrame

     id        date     amount  code
  0  3107  2010-10-20   136.4004   290
  1  3001  2010-10-08   104.1800   290
  2  3109  2010-10-08   276.0629   165
  3  3001  2010-10-08  -177.9800   290
  4  3002  2010-10-08  1871.1094   290
  5  3109  2010-10-08   225.7038   155
  6  3109  2010-10-08    98.5578   170
  7  3107  2010-10-08   231.3949   165
  8  3203  2010-10-08   333.6636   290
  9 -9100  2010-10-08  3478.7500   290

If previous rows not needed:

  b.sort_values("date").groupby(["id","date"]).last().reset_index()

The groupby aggregates the data by the "last" meaning the last value for those columns.

Output only latest rows with values:

   id        date     amount  code
0 -9100  2010-10-08  3478.7500   290
1  3001  2010-10-08  -177.9800   290 
2  3002  2010-10-08  1871.1094   290
3  3107  2010-10-08   231.3949   165
4  3107  2010-10-20   136.4004   290
5  3109  2010-10-08    98.5578   170
6  3203  2010-10-08   333.6636   290

I think you are looking for pandas groupby .

For example, df.groubpy('Patient').last() will return a DataFrame with the last observation of each patient . If the patients are not sorted by date you can find the latest record date using max function.

df.groupby('Patient').last()
             Date colA colB
Patient                    
1        3/5/2016    8    .
2             etc    2    .

You can make your own functions and then call the apply() function of groupby .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM