[英]How do I perform computations over grouped data by accessing a particular column of the grouped data and then convert it back to dataframe
我正在嘗試對車輛數據進行一些數據分析,需要按車輛 ID 對數據進行分組,然后根據每個 ID,使用每輛車的第一行找到 distance_along_path 並將其與另一列相減,然后執行數據的累積總和。
基本上單個車輛 ID 的步驟是:
代碼
df_signal_group = df_broadway[
df_broadway.trajectory_signal_group == '4']
df_signal_group_1 = df_signal_group[
df_signal_group.temporaryId == 26]
df_signal_group_1['distance_along_path_change'] = (
df_signal_group_1['distance_along_path'] - 172.78
) # this is the first row for each vehicle's distance_to_stopbar
df_signal_group_1['groupbydistance'] = (
df_signal_group_1
.distance_along_path_change
.eq(-172.78).cumsum()
)
我有多輛這樣的車輛,我閱讀並為所有車輛 ID 重復這些步驟
df_signal_group = df_broadway[
df_broadway.trajectory_signal_group == '4']
df_grouped = df_signal_group.groupby('temporaryId')
我也停留在這一步,我該如何進一步處理......我知道我可以使用df_signal_group.groupby('temporaryId').first()
獲得每個組的第一行值......但是我如何使用此數據為每個組迭代組......任何指針都會有所幫助。
樣品日期如下
這只是現實中的示例數據,車輛 ID 混合在一起,因此需要分組。
temporaryId trajectory_signal_group distance_to_stopbar distance_along_path onmap_status
26 4 172.78 0 True
26 4 170.33 2.459140924298365 True
26 4 167.88 4.883816339797585 True
26 4 165.49 7.274043647721051 True
26 4 164.31 8.456244827531695 True
26 4 161.96 10.794833648650943 True
26 4 159.66 13.099019997543072 True
26 4 158.51 14.238218211441483 True
125 4 173.54 0 True
125 4 172.4 1.179344296415053 True
125 4 170.01 3.5609045873593734 True
125 4 167.61 5.95965979143056 True
125 4 165.2 8.362024854827855 True
125 4 162.79 10.76439000598294 True
125 4 160.38 13.166755196815991 True
125 4 157.98 15.56912041000858 True
125 4 156.77 16.77030301927281 True
125 4 155.57 17.971485632809344 True
125 4 154.36 19.172668245991783 True
125 4 151.96 21.57503347794954 True
125 4 150.76 22.776216095592986 True
125 4 148.34 25.17858133262119 True
125 4 147.14 26.37976395246835 True
125 4 144.73 28.783361992012317 True
125 4 143.52 29.989240517622683 True
125 4 141.09 32.41716300616539 True
謝謝你。
預計 output -
temporaryId trajectory_signal_group distance_to_stopbar distance_along_path onmap_status distance_along_path_change groupbydistance
260 4 172.6 0 True -172.6 0
260 4 171.65 0.9526235800176956 True -171.6473764199823 0
260 4 169.7 2.8877960903921576 True -169.71220390960784 0
260 4 167.73 4.862869066444613 True -167.7371309335554 0
260 4 166.72 5.865368230445712 True -166.73463176955428 0
260 4 164.68 7.899986028468888 True -164.70001397153112 0
260 4 163.65 8.930637572963427 True -163.66936242703656 0
260 4 162.61 9.968169381978832 True -162.63183061802116 0
260 4 161.56 11.011111474828203 True -161.5888885251718 0
260 4 159.46 13.108045032255115 True -159.49195496774487 0
26 4 172.78 0 True -172.78 1
26 4 170.33 2.459140924298365 True -170.32085907570163 1
26 4 167.88 4.883816339797585 True -167.8961836602024 1
26 4 165.49 7.274043647721051 True -165.50595635227896 1
26 4 164.31 8.456244827531695 True -164.3237551724683 1
26 4 161.96 10.794833648650945 True -161.98516635134905 1
26 4 159.66 13.099019997543072 True -159.68098000245692 1
26 4 158.51 14.238218211441483 True -158.54178178855852 1
26 4 156.26 16.490836950069347 True -156.28916304993066 1
26 4 154.03 18.70910216437552 True -154.0708978356245 1
26 4 151.84 20.893034435436896 True -151.8869655645631 1
26 4 150.76 21.972132321013312 True -150.8078676789867 1
該過程的流程是查找按 id 和 group 列分組的第一行。 接下來,采用累加和來確定順序。 我們將“distance_to_stopbar”乘以 -1 進行計算。 將新數據框與原始數據框連接起來。 向前填充生成的 NA。 最后,我們計算“distance_alog_path_change”。
df_groups = df_broadway.groupby(['temporaryId','trajectory_signal_group']).first().reset_index()
df_groups['groupbydistance'] = df_groups['onmap_status'].cumsum()
df_groups['distance_along_path'] = df_groups['distance_to_stopbar'] * -1
df_broadway = df_broadway.merge(df_groups, on=['temporaryId','trajectory_signal_group','distance_to_stopbar'], how='outer')
df_broadway.columns = ['temporaryId', 'trajectory_signal_group', 'distance_to_stopbar', 'distance_along_path', 'onmap_status', 'tmp', 'distance_along_path_change', 'groupbydistance']
df_broadway.fillna(method='ffill', inplace=True)
df_broadway['distance_along_path_change'] = df_broadway['distance_to_stopbar'] + df_broadway['tmp']
df_broadway.drop('tmp', axis=1, inplace=True)
df_broadway.head(10)
temporaryId trajectory_signal_group distance_to_stopbar distance_along_path onmap_status distance_along_path_change groupbydistance
0 26 4 172.78 0.000000 True 0.00 1.0
1 26 4 170.33 2.459141 True -2.45 1.0
2 26 4 167.88 4.883816 True -4.90 1.0
3 26 4 165.49 7.274044 True -7.29 1.0
4 26 4 164.31 8.456245 True -8.47 1.0
5 26 4 161.96 10.794834 True -10.82 1.0
6 26 4 159.66 13.099020 True -13.12 1.0
7 26 4 158.51 14.238218 True -14.27 1.0
8 125 4 173.54 0.000000 True 0.00 2.0
9 125 4 172.40 1.179344 True -1.14 2.0
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.