简体   繁体   English

按pandas GroupBy中的列表列分组

[英]Grouping by column of lists in pandas GroupBy

I have the following df ,我有以下df

pri_key          doc_no    c_code
[9001, 7620]     767       0090
[9001, 7620]     767       0090
[9002, 7530]     768       4100
[9002, 7530]     769       3000
[9003, 7730]     777       4000
[9003, 7730]     777       4000
[9003, 7730]     779       4912

I need to hash pri_key then groupby hashed pri_key , and excludes groups whose rows have the same doc_no and c_code combination from df ;我需要散列pri_key然后 groupby 散列pri_key ,并从df排除其行具有相同doc_noc_code组合的组;

 df["doc_group"] = df['pri_key'].apply(lambda ls: hash(tuple(sorted(ls))))

 grouped = df.groupby("doc_group")

 m = grouped[['doc_no', 'c_code']].apply(lambda x: len(np.unique(x.values)) > 1)

 df = df.loc[m]

but it did not work,但它没有用,

pandas.core.indexing.IndexingError: Unalignable boolean Series provided as indexer (index of the boolean Series and of the indexed object do not match

I am wondering how to solve this.我想知道如何解决这个问题。 So the result will look like,所以结果看起来像,

pri_key          doc_no    c_code
[9002, 7530]     768       4100
[9002, 7530]     769       3000
[9003, 7730]     777       4000
[9003, 7730]     777       4000
[9003, 7730]     779       4912    

You can tupleize and hash pri_key , then use it to group on df :您可以tupleize和哈希pri_key ,然后用它来组对df

grouper = [hash(tuple(x)) for x in df['pri_key']]
df[df.groupby(grouper)[['doc_no', 'c_code']].transform('nunique').gt(1).all(1)]

        pri_key  doc_no  c_code
2  [9002, 7530]     768    4100
3  [9002, 7530]     769    3000
4  [9003, 7730]     777    4000
5  [9003, 7730]     777    4000
6  [9003, 7730]     779    4912

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM