简体   繁体   中英

How to get the number of unique combinations of two columns that occur in a python pandas dataframe

Let's say I have this dataframe in pandas

     a    b
1    203  487
2    876  111
3    203  487
4    876  487

There are more columns that I don't care about not shown

I know len(df.a.unique()) will return 2 to indicate there are two unique values of a, as will len(df.b.unique()) . I want something similar to this, but returns the number of unique combinations of a AND b that occur. So in this example, I would want it to return 3.

Any guidance on how I can go about doing this are appreciated

Use drop_duplicates :

print (df.drop_duplicates(['a','b']))
     a    b
1  203  487
2  876  111
4  876  487

a = len(df.drop_duplicates(['a','b']).index)

Or duplicated with inverting condition:

a = (~df.duplicated(['a','b'])).sum()

a = len(df.index) - df.duplicated(['a','b']).sum()

Or convert columns to strings and join together, then get nunique :

a = (df.a.astype(str) + '_' + df.b.astype(str)).nunique()

print (a)
3

Do you count cases like below as two different combinations or one?

1) 'a' is 203 and 'b' is 487 2) 'a' is 487 and 'b' is 203

If you want it as two, just use drop_duplicates as jezrael said. If you want them to count as one unique combination I would create a new column so it will always be: the smaller number_the bigger number and do the drop_duplicates on this column.

Import numpy as np re
df['c']=np.where(df['a']<df['b'], \
    df['a'].astype('str')+"_"+df['b'].astype('str'), \
        df['b'].astype('str')+"_"+df['a'].astype('str'))

print(len(df.drop_duplicates('c')))

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM