简体   繁体   English

如何获取在python pandas数据框中出现的两列的唯一组合的数量

[英]How to get the number of unique combinations of two columns that occur in a python pandas dataframe

Let's say I have this dataframe in pandas 假设我在熊猫中有这个数据框

     a    b
1    203  487
2    876  111
3    203  487
4    876  487

There are more columns that I don't care about not shown 还有更多我不在乎的列未显示

I know len(df.a.unique()) will return 2 to indicate there are two unique values of a, as will len(df.b.unique()) . 我知道len(df.a.unique())将返回2表示有一个两个独特的值,如将len(df.b.unique()) I want something similar to this, but returns the number of unique combinations of a AND b that occur. 我想要类似的东西,但返回发生的AND b的唯一组合的数量。 So in this example, I would want it to return 3. 因此,在此示例中,我希望它返回3。

Any guidance on how I can go about doing this are appreciated 我对如何做到这一点的任何指导表示赞赏

Use drop_duplicates : 使用drop_duplicates

print (df.drop_duplicates(['a','b']))
     a    b
1  203  487
2  876  111
4  876  487

a = len(df.drop_duplicates(['a','b']).index)

Or duplicated with inverting condition: 或以相反条件duplicated

a = (~df.duplicated(['a','b'])).sum()

a = len(df.index) - df.duplicated(['a','b']).sum()

Or convert columns to strings and join together, then get nunique : 或者将列转换为字符串并连接在一起,然后得到nunique

a = (df.a.astype(str) + '_' + df.b.astype(str)).nunique()

print (a)
3

Do you count cases like below as two different combinations or one? 您是否将以下情况视为两种不同的组合或一种?

1) 'a' is 203 and 'b' is 487 2) 'a' is 487 and 'b' is 203 1)'a'是203而'b'是487 2)'a'是487而'b'是203

If you want it as two, just use drop_duplicates as jezrael said. 如果您希望将其设置为两个,请按照jezrael的说明使用drop_duplicates。 If you want them to count as one unique combination I would create a new column so it will always be: the smaller number_the bigger number and do the drop_duplicates on this column. 如果希望它们算作一个唯一的组合,我将创建一个新列,使其始终为:较小的number_较大的数字,并在此列上执行drop_duplicates。

Import numpy as np re
df['c']=np.where(df['a']<df['b'], \
    df['a'].astype('str')+"_"+df['b'].astype('str'), \
        df['b'].astype('str')+"_"+df['a'].astype('str'))

print(len(df.drop_duplicates('c')))

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM