GroupBy Column1，然后獲取Column2上第一個/最后一個元素的所有元素（Python）

Question

df=(pd.DataFrame({'user_id':[1,1,1,1,1,1,1,2,2,2,2,2,2,2,3,3,3,3,3,3,3,4,4,4,4,4,4],'survey_id':[1,1,1,1,2,2,3,4,4,4,5,5,6,6,7,8,8,9,9,9,9,10,10,11,11,12,12],
              'answer':["no","yes","no","no","yes","no","no","yes","no","yes","no","no","yes","no","no","no","yes","yes","yes","no","no","no","yes","no","yes","no","yes"]}))
df

    user_id     survey_id   answer
0   1   1   no
1   1   1   yes
2   1   1   no
3   1   1   no
4   1   2   yes
5   1   2   no
6   1   3   no
7   2   4   yes
8   2   4   no
9   2   4   yes
10  2   5   no
11  2   5   no
12  2   6   yes
13  2   6   no
14  3   7   no
15  3   8   no
16  3   8   yes
17  3   9   yes
18  3   9   yes
19  3   9   no
20  3   9   no
21  4   10  no
22  4   10  yes
23  4   11  no
24  4   11  yes
25  4   12  no
26  4   12  yes

我想按user_id分組，然后獲取survey_id的第一個元素，並獲取與此選擇相關的所有元素

df_head=
    user_id     survey_id   answer
0   1   1   no
1   1   1   yes
2   1   1   no
3   1   1   no
4   2   4   yes
5   2   4   no
6   2   4   yes
7   3   7   no
8   4   10  no
9   4   10  yes

以同樣的方式，我想按user_id分組，然后獲取survey_id的最后一個元素，並獲取與此選擇相關的所有元素

df_tail=
    user_id     survey_id   answer
0   1   3   no
1   2   6   yes
2   2   6   no
3   3   9   yes
4   3   9   yes
5   3   9   no
6   3   9   no
7   4   12  no
8   4   12  yes

是否有一個快速的 groupby 命令來獲得這個？ 我可以通過合並數據框來做到這一點，但我認為有一些更好的方法可以在更少的命令行中做到這一點。 先感謝您

Answer 1

不合並的解決方案：

df_head = df[df.survey_id.eq(df.groupby('user_id').transform('min').survey_id)]

結果：

    user_id  survey_id answer
0         1          1     no
1         1          1    yes
2         1          1     no
3         1          1     no
7         2          4    yes
8         2          4     no
9         2          4    yes
14        3          7     no
21        4         10     no
22        4         10    yes

df_tail = df[df.survey_id.eq(df.groupby('user_id').transform('max').survey_id)]

結果：

    user_id  survey_id answer
6         1          3     no
12        2          6    yes
13        2          6     no
17        3          9    yes
18        3          9    yes
19        3          9     no
20        3          9     no
25        4         12     no
26        4         12    yes

想法是計算每個user_id的survey_id的最小值/最大值，並將其與df行級別的survey_id進行比較。 請注意，保留了 dataframe 的原始索引。 如果您需要新索引，只需在末尾添加：

df_head = df_head.reset_index(drop = True)
df_tail = df_tail.reset_index(drop = True)

GroupBy Column1，然后獲取Column2上第一個/最后一個元素的所有元素（Python）

問題描述

1 個解決方案

解決方案1
2 已采納 2020-07-12 00:23:59

GroupBy Column1，然后獲取Column2上第一個/最后一個元素的所有元素（Python）

問題描述

1 個解決方案

解決方案1 2 已采納 2020-07-12 00:23:59

解決方案1
2 已采納 2020-07-12 00:23:59