I need to group by a data frame and apply some filter and I don't sure how to do that...
Assume there is 3 columns: group, distance, value
, group
is the column of group by, distance
is the column that I want apply the filter, and value
is the column that I want to take if the filter is return true.
Take a look what I did:
from numpy import around
from numpy.random import uniform
from pandas import DataFrame
data = around(a=uniform(low=1.0, high=50.0, size=(20, 3)), decimals=3)
df = DataFrame(data=data, columns=['group', 'distance', 'value'], dtype='float64')
rows, columns = df.shape
df.loc[:rows // 2, 'group'] = 1.0
df.loc[rows // 2:, 'group'] = 2.0
print(df)
df.loc[:, 'next_distance'] = df.groupby(by='group')['distance'].shift(periods=-1)
df.loc[:, 'next_value'] = df.groupby(by='group')['value'].shift(periods=-1)
distance_filter = df.loc[:, 'next_distance'] - df.loc[:, 'distance'] > 10.0
df.loc[distance_filter, 'new_value'] = df.loc[distance_filter, 'next_value']
print(df)
The first print of df
is:
group distance value
0 1.0 3.757 30.593
1 1.0 14.770 13.313
2 1.0 12.594 38.865
3 1.0 47.806 36.357
4 1.0 7.930 28.235
5 1.0 6.133 42.323
6 1.0 23.422 4.883
7 1.0 12.706 1.606
8 1.0 29.787 48.096
9 1.0 41.889 24.148
10 2.0 15.712 28.568
11 2.0 38.143 20.496
12 2.0 24.282 9.562
13 2.0 25.148 26.535
14 2.0 44.163 42.303
15 2.0 38.116 17.947
16 2.0 4.716 17.259
17 2.0 11.980 4.369
18 2.0 35.533 20.866
19 2.0 11.921 47.971
The second print of df
is:
group distance value next_distance next_value new_value
0 1.0 3.757 30.593 14.770 13.313 30.593
1 1.0 14.770 13.313 12.594 38.865 NaN
2 1.0 12.594 38.865 47.806 36.357 38.865
3 1.0 47.806 36.357 7.930 28.235 NaN
4 1.0 7.930 28.235 6.133 42.323 NaN
5 1.0 6.133 42.323 23.422 4.883 42.323
6 1.0 23.422 4.883 12.706 1.606 NaN
7 1.0 12.706 1.606 29.787 48.096 1.606
8 1.0 29.787 48.096 41.889 24.148 48.096
9 1.0 41.889 24.148 NaN NaN NaN
10 2.0 15.712 28.568 38.143 20.496 28.568
11 2.0 38.143 20.496 24.282 9.562 NaN
12 2.0 24.282 9.562 25.148 26.535 NaN
13 2.0 25.148 26.535 44.163 42.303 26.535
14 2.0 44.163 42.303 38.116 17.947 NaN
15 2.0 38.116 17.947 4.716 17.259 NaN
16 2.0 4.716 17.259 11.980 4.369 NaN
17 2.0 11.980 4.369 35.533 20.866 4.369
18 2.0 35.533 20.866 11.921 47.971 NaN
19 2.0 11.921 47.971 NaN NaN NaN
All I need is the new_value
column, there is a way to do it better?
You can use grouoby
with both columns and then subtract df1['distance'] - df['distance']
:
df1 = df.groupby(by='group')[['distance','value']].shift(periods=-1)
distance_filter = df1['distance'] - df['distance'] > 10.0
df.loc[distance_filter, 'new_value'] = df1.loc[distance_filter, 'value']
print(df)
group distance value new_value
0 1.0 26.097 16.973 16.973
1 1.0 36.866 28.804 NaN
2 1.0 28.644 17.779 NaN
3 1.0 19.339 44.409 NaN
4 1.0 5.768 28.003 28.003
5 1.0 40.646 3.632 NaN
6 1.0 20.141 8.516 NaN
7 1.0 17.949 46.639 NaN
8 1.0 23.825 45.374 NaN
9 1.0 11.013 33.044 NaN
10 2.0 42.859 39.162 NaN
11 2.0 45.025 17.099 NaN
12 2.0 7.124 19.366 19.366
13 2.0 22.728 23.045 23.045
14 2.0 34.603 46.527 46.527
15 2.0 45.901 40.602 NaN
16 2.0 20.585 11.294 NaN
17 2.0 27.979 24.360 NaN
18 2.0 15.374 5.726 5.726
19 2.0 27.611 17.011 NaN
If need same output only a bit change:
df=df.join(df.groupby('group')[['distance','value']].shift(periods=-1).add_prefix('next_'))
distance_filter = df['next_distance'] - df['distance'] > 10.0
df.loc[distance_filter, 'new_value'] = df.loc[distance_filter, 'next_value']
print(df)
group distance value next_distance next_value new_value
0 1.0 12.253 29.438 28.814 38.660 29.438
1 1.0 28.814 38.660 20.756 24.588 NaN
2 1.0 20.756 24.588 16.776 11.183 NaN
3 1.0 16.776 11.183 7.214 47.655 NaN
4 1.0 7.214 47.655 17.083 17.805 NaN
5 1.0 17.083 17.805 24.074 4.120 NaN
6 1.0 24.074 4.120 40.108 48.605 4.120
7 1.0 40.108 48.605 40.571 1.591 NaN
8 1.0 40.571 1.591 30.987 36.448 NaN
9 1.0 30.987 36.448 NaN NaN NaN
10 2.0 37.585 13.128 9.864 18.969 NaN
11 2.0 9.864 18.969 46.241 39.490 18.969
12 2.0 46.241 39.490 40.612 7.873 NaN
13 2.0 40.612 7.873 39.053 16.816 NaN
14 2.0 39.053 16.816 13.665 32.730 NaN
15 2.0 13.665 32.730 35.349 43.783 32.730
16 2.0 35.349 43.783 11.412 19.120 NaN
17 2.0 11.412 19.120 40.855 41.502 19.120
18 2.0 40.855 41.502 16.973 40.430 NaN
19 2.0 16.973 40.430 NaN NaN NaN
EDIT:
df1 = df[['group']].join(df.groupby(by='group')[['distance','value']].shift(periods=-1))
print (df1)
group distance value
0 1.0 44.142 10.032
1 1.0 14.315 30.959
2 1.0 31.881 44.687
3 1.0 25.850 2.651
4 1.0 40.928 9.444
5 1.0 2.230 18.175
6 1.0 22.793 21.242
7 1.0 2.378 19.381
8 1.0 10.907 29.599
9 1.0 NaN NaN
10 2.0 32.876 24.147
11 2.0 38.133 41.621
12 2.0 39.026 39.042
13 2.0 19.474 5.325
14 2.0 31.824 6.052
15 2.0 46.525 49.705
16 2.0 17.858 48.050
17 2.0 14.817 9.273
18 2.0 24.547 16.233
19 2.0 NaN NaN
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.