How to enumerate rows in pandas with nonunique values in groups

Question

I am working with expeditions geodata. Could you help with enumeration of stations and records for the same station depending on expedition ID (ID), date (Date), latitude (Lat), longitude (Lon) and some value (Val, it is not reasonable for enumeration)? Assume that station is a group of rows with the same (ID,Date,Lat,Lon), expedition is a group of rows with the same ID. Dataframe is sorted by 4 columns as in example.

Dataset and required columns

import pandas as pd
data = [[1,'2017/10/10',70.1,30.4,10],\
    [1,'2017/10/10',70.1,31.4,20],\
    [1,'2017/10/10',70.1,31.4,10],\
    [1,'2017/10/10',70.1,31.4,10],\
    [1,'2017/10/12',70.1,31.4,20],\
    [2,'2017/12/10',70.1,30.4,20],\
    [2,'2017/12/10',70.1,31.4,20]];

df = pd.DataFrame(data,columns=['ID','Date','Lat','Lon','Val']);

Additional (I need it, St for station number and Rec for record number within the same station data; output for example above):

df['St'] = [1,2,2,2,3,1,2];
df['Rec'] = [1,1,2,3,1,1,1];
print(df)

I tried and used groupby/cumcount/agg/factorize but have not solved my problem.

Any help! Thanks!

Answer 1

To create 'St' , you can use groupby on 'ID' and then check when any of the columns 'Date','Lat','Lon' is different than the previous one using shift , and use cumsum to get the numbers you want, such as:

df['St'] = (df.groupby(['ID'])
              .apply(lambda x: (x[['Date','Lat','Lon']].shift() != x[['Date','Lat','Lon']])
                               .any(axis=1).cumsum())).values

And to create 'Rec' , you also need groupby but on all columns 'ID','Date','Lat','Lon' and then use cumcount and add such as:

df['Rec'] = df.groupby(['ID','Date','Lat','Lon']).cumcount().add(1)

and you get:

   ID        Date   Lat   Lon  Val  St  Rec
0   1  2017/10/10  70.1  30.4   10   1    1
1   1  2017/10/10  70.1  31.4   20   2    1
2   1  2017/10/10  70.1  31.4   10   2    2
3   1  2017/10/10  70.1  31.4   10   2    3
4   1  2017/10/12  70.1  31.4   20   3    1
5   2  2017/12/10  70.1  30.4   20   1    1
6   2  2017/12/10  70.1  31.4   20   2    1

How to enumerate rows in pandas with nonunique values in groups

Question

1 answers

solution1
2 ACCPTED 2018-08-09 14:23:14

How to enumerate rows in pandas with nonunique values in groups

Question

1 answers

solution1 2 ACCPTED 2018-08-09 14:23:14

solution1
2 ACCPTED 2018-08-09 14:23:14