Pandas 数据帧上的 group by 和字符串连接后的“Nan”

Question

I have a dataframe like this:我有一个这样的数据框：

name  | weekday | count 
Peter | Friday  | {16, 17, 9, 10, 15}
Peter | Friday  | {10, 11, 14}  
Peter | Friday  | {16, 17, 11, 12, 15}  
Bob   | Friday  | {10}
Bob   | Friday  | {9, 10, 11, 12, 13}
Bob   | Friday  | {9, 10, 11, 14, 15}

I want to group by name and weekday, add a new column of intersection of count like this:我想按名称和工作日分组，添加一个新的count交叉列，如下所示：

name  | weekday | intersection 
Peter | Friday  | 
Bob   | Friday  | 10

where empty string should be returned for no intersection situation, here's the code I'm using:在没有交集的情况下应该返回空字符串，这是我正在使用的代码：

df.groupby(['name','weekday']).apply(lambda x: pd.Series({'intersection': ", ".join("{0}".format(n) for n in sorted(list(set.intersection(*x['count']))))})).reset_index()

But I'm getting result like this:但我得到这样的结果：

name  | weekday | intersection 
Peter | Friday  | Nan
Bob   | Friday  | 10

I've tried ''.join() on empty list and it worked and returned empty string, but it won't work after using the group by, I have no idea why it's doing this and how to solve it我在空列表上尝试过''.join()并且它工作并返回空字符串，但是在使用 group by 后它不起作用，我不知道它为什么这样做以及如何解决它

Answer 1

Find the intersection via reduce , "stringify" and join:通过reduce 、"stringify" 和 join 找到交集：

from functools import reduce

def get_intersection(s: pd.Series) -> str:
    intersect = reduce(lambda a, b: a.intersection(b), s.iloc[1:], s.iat[0])
    return ', '.join([str(x) for x in intersect])

intersection = (df.groupby(['name', 'weekday'])['count']
                  .agg(get_intersection)
                  .rename('intersection')
                  .reset_index()
                 )

which gives you:这给了你：

print(intersection)

    name    weekday intersection
0   Bob     Friday  10
1   Peter   Friday

if you're dealing with large datasets with little overlap, a while len(intersect) > 0 loop would probably be better than reduce for avoiding unnecessary processing/work如果您正在处理几乎没有重叠的大型数据集， while len(intersect) > 0循环可能比reduce更好，以避免不必要的处理/工作

Pandas 数据帧上的 group by 和字符串连接后的“Nan”

问题描述

1 个解决方案

解决方案1
0 2020-01-14 02:12:46

Pandas 数据帧上的 group by 和字符串连接后的“Nan”

问题描述

1 个解决方案

解决方案1 0 2020-01-14 02:12:46

解决方案1
0 2020-01-14 02:12:46