[英]Join unique values in a column based on intersection of other columns in pandas
Let's say I have the following Dataframe:假设我有以下 Dataframe:
df = pd.DataFrame({"A": ["foo", "foo", "foo", "foo", "foo",
"bar", "bar", "bar", "bar","bar"],
"B": ["one", "one", "one", "two", "two",
"one", "one", "two", "two","two"],
"C": ["small", "large", "large", "small",
"small", "large", "small", "small",
"large", "large"],
"D": [1, 2, 3, 4, 5, 6, 7, 8, 9,99999]})
I'd like to join (concatenate? or merge?) values in "D" column if there is an intersection of values in "A", "B" and "C".如果“A”、“B”和“C”中的值存在交集,我想加入(连接?或合并?)“D”列中的值。 By intersection, what I mean is that I want to have this DataFrame:通过交集,我的意思是我想要这个 DataFrame:
A B C D
0 foo one small 1
1 foo one large 2,3
2 foo two small 4,5
3 bar one large 6
4 bar one small 7
5 bar two small 8
6 bar two large 9,99999
There are aggregation functions like min, max, sum etc, but I couldn't come up with a solution at all.有最小、最大、总和等聚合函数,但我根本想不出解决方案。
Convert column D
to strings, so possible aggregate by join
in GroupBy.agg
:将D
列转换为字符串,因此可能通过join
GroupBy.agg
进行聚合:
df1 = (df.assign(D = df.D.astype(str))
.groupby(['A','B','C'], sort=False)['D']
.agg(','.join)
.reset_index())
print (df1)
A B C D
0 foo one small 1
1 foo one large 2,3
2 foo two small 4,5
3 bar one large 6
4 bar one small 7
5 bar two small 8
6 bar two large 9,99999
Or use lambda function:或者使用 lambda function:
df1 = (df.groupby(['A','B','C'], sort=False)['D']
.agg(lambda x: ','.join(x.astype(str)))
.reset_index())
print (df1)
A B C D
0 foo one small 1
1 foo one large 2,3
2 foo two small 4,5
3 bar one large 6
4 bar one small 7
5 bar two small 8
6 bar two large 9,99999
If possible duplicated values in D
per groups and need unique values add DataFrame.drop_duplicates
or Series.unique
:如果每个组D
中的重复值可能并且需要唯一值,请添加DataFrame.drop_duplicates
或Series.unique
:
df2 = (df.assign(D = df.D.astype(str))
.drop_duplicates(['A','B','C','D'])
.groupby(['A','B','C'], sort=False)['D']
.agg(','.join)
.reset_index())
df2 = (df.groupby(['A','B','C'], sort=False)['D']
.agg(lambda x: ','.join(x.astype(str).unique()))
.reset_index())
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.