[英]'Nan' after group by and string concatenation on pandas dataframe
I have a dataframe like this:我有一个这样的数据框:
name | weekday | count
Peter | Friday | {16, 17, 9, 10, 15}
Peter | Friday | {10, 11, 14}
Peter | Friday | {16, 17, 11, 12, 15}
Bob | Friday | {10}
Bob | Friday | {9, 10, 11, 12, 13}
Bob | Friday | {9, 10, 11, 14, 15}
I want to group by name and weekday, add a new column of intersection of count
like this:我想按名称和工作日分组,添加一个新的
count
交叉列,如下所示:
name | weekday | intersection
Peter | Friday |
Bob | Friday | 10
where empty string should be returned for no intersection situation, here's the code I'm using:在没有交集的情况下应该返回空字符串,这是我正在使用的代码:
df.groupby(['name','weekday']).apply(lambda x: pd.Series({'intersection': ", ".join("{0}".format(n) for n in sorted(list(set.intersection(*x['count']))))})).reset_index()
But I'm getting result like this:但我得到这样的结果:
name | weekday | intersection
Peter | Friday | Nan
Bob | Friday | 10
I've tried ''.join()
on empty list and it worked and returned empty string, but it won't work after using the group by, I have no idea why it's doing this and how to solve it我在空列表上尝试过
''.join()
并且它工作并返回空字符串,但是在使用 group by 后它不起作用,我不知道它为什么这样做以及如何解决它
Find the intersection via reduce , "stringify" and join:通过reduce 、"stringify" 和 join 找到交集:
from functools import reduce
def get_intersection(s: pd.Series) -> str:
intersect = reduce(lambda a, b: a.intersection(b), s.iloc[1:], s.iat[0])
return ', '.join([str(x) for x in intersect])
intersection = (df.groupby(['name', 'weekday'])['count']
.agg(get_intersection)
.rename('intersection')
.reset_index()
)
which gives you:这给了你:
print(intersection)
name weekday intersection
0 Bob Friday 10
1 Peter Friday
if you're dealing with large datasets with little overlap, a while len(intersect) > 0
loop would probably be better than reduce
for avoiding unnecessary processing/work如果您正在处理几乎没有重叠的大型数据集,
while len(intersect) > 0
循环可能比reduce
更好,以避免不必要的处理/工作
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.