[英]How to get element wise boolean array if array elements of column1 exists in array column2? [Pyspark/ Python]
[英]GroupBy Column1, then get all elements with the first/last element on Column2 (Python)
df=(pd.DataFrame({'user_id':[1,1,1,1,1,1,1,2,2,2,2,2,2,2,3,3,3,3,3,3,3,4,4,4,4,4,4],'survey_id':[1,1,1,1,2,2,3,4,4,4,5,5,6,6,7,8,8,9,9,9,9,10,10,11,11,12,12],
'answer':["no","yes","no","no","yes","no","no","yes","no","yes","no","no","yes","no","no","no","yes","yes","yes","no","no","no","yes","no","yes","no","yes"]}))
df
user_id survey_id answer
0 1 1 no
1 1 1 yes
2 1 1 no
3 1 1 no
4 1 2 yes
5 1 2 no
6 1 3 no
7 2 4 yes
8 2 4 no
9 2 4 yes
10 2 5 no
11 2 5 no
12 2 6 yes
13 2 6 no
14 3 7 no
15 3 8 no
16 3 8 yes
17 3 9 yes
18 3 9 yes
19 3 9 no
20 3 9 no
21 4 10 no
22 4 10 yes
23 4 11 no
24 4 11 yes
25 4 12 no
26 4 12 yes
我想按user_id
分組,然后獲取survey_id
的第一個元素,並獲取與此選擇相關的所有元素
df_head=
user_id survey_id answer
0 1 1 no
1 1 1 yes
2 1 1 no
3 1 1 no
4 2 4 yes
5 2 4 no
6 2 4 yes
7 3 7 no
8 4 10 no
9 4 10 yes
以同樣的方式,我想按user_id
分組,然后獲取survey_id
的最后一個元素,並獲取與此選擇相關的所有元素
df_tail=
user_id survey_id answer
0 1 3 no
1 2 6 yes
2 2 6 no
3 3 9 yes
4 3 9 yes
5 3 9 no
6 3 9 no
7 4 12 no
8 4 12 yes
是否有一個快速的 groupby 命令來獲得這個? 我可以通過合並數據框來做到這一點,但我認為有一些更好的方法可以在更少的命令行中做到這一點。 先感謝您
不合並的解決方案:
df_head = df[df.survey_id.eq(df.groupby('user_id').transform('min').survey_id)]
結果:
user_id survey_id answer
0 1 1 no
1 1 1 yes
2 1 1 no
3 1 1 no
7 2 4 yes
8 2 4 no
9 2 4 yes
14 3 7 no
21 4 10 no
22 4 10 yes
df_tail = df[df.survey_id.eq(df.groupby('user_id').transform('max').survey_id)]
結果:
user_id survey_id answer
6 1 3 no
12 2 6 yes
13 2 6 no
17 3 9 yes
18 3 9 yes
19 3 9 no
20 3 9 no
25 4 12 no
26 4 12 yes
想法是計算每個user_id
的survey_id
的最小值/最大值,並將其與df
行級別的survey_id
進行比較。 請注意,保留了 dataframe 的原始索引。 如果您需要新索引,只需在末尾添加:
df_head = df_head.reset_index(drop = True)
df_tail = df_tail.reset_index(drop = True)
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.