簡體   English   中英

如何通過訪問分組數據的特定列對分組數據執行計算,然后將其轉換回 dataframe

[英]How do I perform computations over grouped data by accessing a particular column of the grouped data and then convert it back to dataframe

我正在嘗試對車輛數據進行一些數據分析,需要按車輛 ID 對數據進行分組,然后根據每個 ID,使用每輛車的第一行找到 distance_along_path 並將其與另一列相減,然后執行數據的累積總和。

基本上單個車輛 ID 的步驟是:

代碼

df_signal_group = df_broadway[
    df_broadway.trajectory_signal_group == '4']
df_signal_group_1 = df_signal_group[
    df_signal_group.temporaryId == 26]
df_signal_group_1['distance_along_path_change'] = (
    df_signal_group_1['distance_along_path'] - 172.78
) # this is the first row for each vehicle's distance_to_stopbar
df_signal_group_1['groupbydistance'] = (
    df_signal_group_1
    .distance_along_path_change
    .eq(-172.78).cumsum()
)

我有多輛這樣的車輛,我閱讀並為所有車輛 ID 重復這些步驟

df_signal_group = df_broadway[
    df_broadway.trajectory_signal_group == '4']
df_grouped = df_signal_group.groupby('temporaryId')

我也停留在這一步,我該如何進一步處理......我知道我可以使用df_signal_group.groupby('temporaryId').first()獲得每個組的第一行值......但是我如何使用此數據為每個組迭代組......任何指針都會有所幫助。

樣品日期如下

在此處輸入圖像描述

這只是現實中的示例數據,車輛 ID 混合在一起,因此需要分組。

temporaryId trajectory_signal_group distance_to_stopbar distance_along_path onmap_status
26  4   172.78  0   True
26  4   170.33  2.459140924298365   True
26  4   167.88  4.883816339797585   True
26  4   165.49  7.274043647721051   True
26  4   164.31  8.456244827531695   True
26  4   161.96  10.794833648650943  True
26  4   159.66  13.099019997543072  True
26  4   158.51  14.238218211441483  True
125 4   173.54  0   True
125 4   172.4   1.179344296415053   True
125 4   170.01  3.5609045873593734  True
125 4   167.61  5.95965979143056    True
125 4   165.2   8.362024854827855   True
125 4   162.79  10.76439000598294   True
125 4   160.38  13.166755196815991  True
125 4   157.98  15.56912041000858   True
125 4   156.77  16.77030301927281   True
125 4   155.57  17.971485632809344  True
125 4   154.36  19.172668245991783  True
125 4   151.96  21.57503347794954   True
125 4   150.76  22.776216095592986  True
125 4   148.34  25.17858133262119   True
125 4   147.14  26.37976395246835   True
125 4   144.73  28.783361992012317  True
125 4   143.52  29.989240517622683  True
125 4   141.09  32.41716300616539   True

謝謝你。

預計 output -

temporaryId trajectory_signal_group distance_to_stopbar distance_along_path onmap_status    distance_along_path_change  groupbydistance
260 4   172.6   0   True    -172.6  0
260 4   171.65  0.9526235800176956  True    -171.6473764199823  0
260 4   169.7   2.8877960903921576  True    -169.71220390960784 0
260 4   167.73  4.862869066444613   True    -167.7371309335554  0
260 4   166.72  5.865368230445712   True    -166.73463176955428 0
260 4   164.68  7.899986028468888   True    -164.70001397153112 0
260 4   163.65  8.930637572963427   True    -163.66936242703656 0
260 4   162.61  9.968169381978832   True    -162.63183061802116 0
260 4   161.56  11.011111474828203  True    -161.5888885251718  0
260 4   159.46  13.108045032255115  True    -159.49195496774487 0
26  4   172.78  0   True    -172.78 1
26  4   170.33  2.459140924298365   True    -170.32085907570163 1
26  4   167.88  4.883816339797585   True    -167.8961836602024  1
26  4   165.49  7.274043647721051   True    -165.50595635227896 1
26  4   164.31  8.456244827531695   True    -164.3237551724683  1
26  4   161.96  10.794833648650945  True    -161.98516635134905 1
26  4   159.66  13.099019997543072  True    -159.68098000245692 1
26  4   158.51  14.238218211441483  True    -158.54178178855852 1
26  4   156.26  16.490836950069347  True    -156.28916304993066 1
26  4   154.03  18.70910216437552   True    -154.0708978356245  1
26  4   151.84  20.893034435436896  True    -151.8869655645631  1
26  4   150.76  21.972132321013312  True    -150.8078676789867  1

該過程的流程是查找按 id 和 group 列分組的第一行。 接下來,采用累加和來確定順序。 我們將“distance_to_stopbar”乘以 -1 進行計算。 將新數據框與原始數據框連接起來。 向前填充生成的 NA。 最后,我們計算“distance_alog_path_change”。

df_groups = df_broadway.groupby(['temporaryId','trajectory_signal_group']).first().reset_index()
df_groups['groupbydistance'] = df_groups['onmap_status'].cumsum()
df_groups['distance_along_path'] = df_groups['distance_to_stopbar'] * -1
df_broadway = df_broadway.merge(df_groups, on=['temporaryId','trajectory_signal_group','distance_to_stopbar'], how='outer')
df_broadway.columns = ['temporaryId', 'trajectory_signal_group', 'distance_to_stopbar', 'distance_along_path', 'onmap_status', 'tmp', 'distance_along_path_change', 'groupbydistance']
df_broadway.fillna(method='ffill', inplace=True)
df_broadway['distance_along_path_change'] = df_broadway['distance_to_stopbar'] + df_broadway['tmp']
df_broadway.drop('tmp', axis=1, inplace=True)

df_broadway.head(10)
    temporaryId trajectory_signal_group distance_to_stopbar distance_along_path onmap_status    distance_along_path_change  groupbydistance
0   26  4   172.78  0.000000    True    0.00    1.0
1   26  4   170.33  2.459141    True    -2.45   1.0
2   26  4   167.88  4.883816    True    -4.90   1.0
3   26  4   165.49  7.274044    True    -7.29   1.0
4   26  4   164.31  8.456245    True    -8.47   1.0
5   26  4   161.96  10.794834   True    -10.82  1.0
6   26  4   159.66  13.099020   True    -13.12  1.0
7   26  4   158.51  14.238218   True    -14.27  1.0
8   125 4   173.54  0.000000    True    0.00    2.0
9   125 4   172.40  1.179344    True    -1.14   2.0

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM