根据 pandas 数据帧中的其他列值组合列值

Question

Let us say I have this data frame.假设我有这个数据框。

df df

 line to_line priority 10 20 1 10 30 1 50 40 3 60 70 2 50 80 3

Based on the line and priority column values (when the are the same or duplicate as shown above), I want to combine to_line values.基于line和priority列的值（当它们相同或重复时，如上所示），我想组合to_line值。 The proposed result should look like the following.建议的结果应如下所示。

 line to_line priority 10 20/30 1 50 40/80 3 60 70 2

I tried something like this but I couldn't get what I want.我尝试过这样的事情，但我无法得到我想要的。

df.groupBy(col("line")).agg(collect_list(col("to_line")) as "to_line").withColumn("to_line", concat_ws(",", col("to_line")))

Could you please help to figure out this?你能帮忙弄清楚这个吗？ I appreciate your time and effort.我感谢您的时间和精力。

Answer 1

You can achieve this by custom aggregation function.您可以通过自定义聚合 function 来实现这一点。

Code代码

df = pd.DataFrame({
    'line': [10,10,50,60,50],
    'to_line': [20,30,40,70,80],
    'priority': [1,1,3,2,3] 
})

array_agg = lambda x: '/'.join(x.astype(str))

grp_df = df.groupby(['line', 'priority']).agg({'to_line': array_agg})

, or if you do not want grouped columns to be indexes, you can pass as_index argument to groupby method ，或者如果您不希望分组列成为索引，则可以将as_index参数传递给groupby方法

grp_df = df.groupby(['line', 'priority'], as_index=False).agg({'to_line': array_agg})

Output Output

              to_line
line priority        
10   1          20/30
50   3          40/80
60   2             70

Answer 2

Maybe something like this:也许是这样的：

res = []
df.to_line = df.to_line.astype(str)
for line_priority, df_chunk in df.groupby(['line','priority']):
    df_chunk = df_chunk.reset_index().sort_values('to_line')
    to_line = "/".join(df_chunk.to_line.values)
    res.append({'to_line':to_line,'priority':line_priority[1],'line':line_priority[0]})
pd.DataFrame(res)

Answer 3

You can use您可以使用

df.groupby(['line','priority'])['to_line'].apply(lambda x: '/'.join(str(y) for y in x)).reset_index(name='to_line')

output output

  line  priority    to_line
0   10        1     20/30
1   50        3     40/80
2   60        2     70

根据 pandas 数据帧中的其他列值组合列值

问题描述

3 个解决方案

解决方案1
4 已采纳 2019-10-17 14:52:58

解决方案2
1 2019-10-17 14:52:29

解决方案3
1 2019-10-17 14:56:54

根据 pandas 数据帧中的其他列值组合列值

问题描述

3 个解决方案

解决方案1 4 已采纳 2019-10-17 14:52:58

解决方案2 1 2019-10-17 14:52:29

解决方案3 1 2019-10-17 14:56:54

解决方案1
4 已采纳 2019-10-17 14:52:58

解决方案2
1 2019-10-17 14:52:29

解决方案3
1 2019-10-17 14:56:54