I am working with expeditions geodata. Could you help with enumeration of stations and records for the same station depending on expedition ID (ID), date (Date), latitude (Lat), longitude (Lon) and some value (Val, it is not reasonable for enumeration)? Assume that station is a group of rows with the same (ID,Date,Lat,Lon), expedition is a group of rows with the same ID. Dataframe is sorted by 4 columns as in example.
import pandas as pd
data = [[1,'2017/10/10',70.1,30.4,10],\
[1,'2017/10/10',70.1,31.4,20],\
[1,'2017/10/10',70.1,31.4,10],\
[1,'2017/10/10',70.1,31.4,10],\
[1,'2017/10/12',70.1,31.4,20],\
[2,'2017/12/10',70.1,30.4,20],\
[2,'2017/12/10',70.1,31.4,20]];
df = pd.DataFrame(data,columns=['ID','Date','Lat','Lon','Val']);
Additional (I need it, St for station number and Rec for record number within the same station data; output for example above):
df['St'] = [1,2,2,2,3,1,2];
df['Rec'] = [1,1,2,3,1,1,1];
print(df)
I tried and used groupby/cumcount/agg/factorize but have not solved my problem.
Any help! Thanks!
To create 'St'
, you can use groupby
on 'ID'
and then check when any
of the columns 'Date','Lat','Lon'
is different than the previous one using shift
, and use cumsum
to get the numbers you want, such as:
df['St'] = (df.groupby(['ID'])
.apply(lambda x: (x[['Date','Lat','Lon']].shift() != x[['Date','Lat','Lon']])
.any(axis=1).cumsum())).values
And to create 'Rec'
, you also need groupby
but on all columns 'ID','Date','Lat','Lon'
and then use cumcount
and add
such as:
df['Rec'] = df.groupby(['ID','Date','Lat','Lon']).cumcount().add(1)
and you get:
ID Date Lat Lon Val St Rec
0 1 2017/10/10 70.1 30.4 10 1 1
1 1 2017/10/10 70.1 31.4 20 2 1
2 1 2017/10/10 70.1 31.4 10 2 2
3 1 2017/10/10 70.1 31.4 10 2 3
4 1 2017/10/12 70.1 31.4 20 3 1
5 2 2017/12/10 70.1 30.4 20 1 1
6 2 2017/12/10 70.1 31.4 20 2 1
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.