How to get the number of unique combinations of two columns that occur in a python pandas dataframe

Question

Let's say I have this dataframe in pandas

     a    b
1    203  487
2    876  111
3    203  487
4    876  487

There are more columns that I don't care about not shown

I know len(df.a.unique()) will return 2 to indicate there are two unique values of a, as will len(df.b.unique()) . I want something similar to this, but returns the number of unique combinations of a AND b that occur. So in this example, I would want it to return 3.

Any guidance on how I can go about doing this are appreciated

Answer 1

Use drop_duplicates :

print (df.drop_duplicates(['a','b']))
     a    b
1  203  487
2  876  111
4  876  487

a = len(df.drop_duplicates(['a','b']).index)

Or duplicated with inverting condition:

a = (~df.duplicated(['a','b'])).sum()

a = len(df.index) - df.duplicated(['a','b']).sum()

Or convert columns to strings and join together, then get nunique :

a = (df.a.astype(str) + '_' + df.b.astype(str)).nunique()

print (a)
3

Answer 2

Do you count cases like below as two different combinations or one?

1) 'a' is 203 and 'b' is 487 2) 'a' is 487 and 'b' is 203

If you want it as two, just use drop_duplicates as jezrael said. If you want them to count as one unique combination I would create a new column so it will always be: the smaller number_the bigger number and do the drop_duplicates on this column.

Import numpy as np re
df['c']=np.where(df['a']<df['b'], \
    df['a'].astype('str')+"_"+df['b'].astype('str'), \
        df['b'].astype('str')+"_"+df['a'].astype('str'))

print(len(df.drop_duplicates('c')))

How to get the number of unique combinations of two columns that occur in a python pandas dataframe

Question

2 answers

solution1
1 ACCPTED 2018-02-18 09:28:14

solution2
1 2018-02-18 10:06:45

How to get the number of unique combinations of two columns that occur in a python pandas dataframe

Question

2 answers

solution1 1 ACCPTED 2018-02-18 09:28:14

solution2 1 2018-02-18 10:06:45

solution1
1 ACCPTED 2018-02-18 09:28:14

solution2
1 2018-02-18 10:06:45