[英]Grouping by column of lists in pandas GroupBy
I have the following df
,我有以下
df
,
pri_key doc_no c_code
[9001, 7620] 767 0090
[9001, 7620] 767 0090
[9002, 7530] 768 4100
[9002, 7530] 769 3000
[9003, 7730] 777 4000
[9003, 7730] 777 4000
[9003, 7730] 779 4912
I need to hash pri_key
then groupby hashed pri_key
, and excludes groups whose rows have the same doc_no
and c_code
combination from df
;我需要散列
pri_key
然后 groupby 散列pri_key
,并从df
排除其行具有相同doc_no
和c_code
组合的组;
df["doc_group"] = df['pri_key'].apply(lambda ls: hash(tuple(sorted(ls))))
grouped = df.groupby("doc_group")
m = grouped[['doc_no', 'c_code']].apply(lambda x: len(np.unique(x.values)) > 1)
df = df.loc[m]
but it did not work,但它没有用,
pandas.core.indexing.IndexingError: Unalignable boolean Series provided as indexer (index of the boolean Series and of the indexed object do not match
I am wondering how to solve this.我想知道如何解决这个问题。 So the result will look like,
所以结果看起来像,
pri_key doc_no c_code
[9002, 7530] 768 4100
[9002, 7530] 769 3000
[9003, 7730] 777 4000
[9003, 7730] 777 4000
[9003, 7730] 779 4912
You can tupleize and hash pri_key
, then use it to group on df
:您可以tupleize和哈希
pri_key
,然后用它来组对df
:
grouper = [hash(tuple(x)) for x in df['pri_key']]
df[df.groupby(grouper)[['doc_no', 'c_code']].transform('nunique').gt(1).all(1)]
pri_key doc_no c_code
2 [9002, 7530] 768 4100
3 [9002, 7530] 769 3000
4 [9003, 7730] 777 4000
5 [9003, 7730] 777 4000
6 [9003, 7730] 779 4912
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.