A Pythonic way to get unique values in a Pandas data frame column based on multiple filtering criteria

Question

I'm trying to get the unique values in a particular column in a Pandas data frame based on multiple filtering criteria. Here is some toy code:

df = pd.DataFrame({'Manufacturer':['<null', 'Mercedes', 'BMW', 'Audi', 'Audi', 'Audi', 'Audi', 'Audi', 'Mercedes', 'BMW'],
                          'Color':['Purple', '<null>', '<null>', 'Blue', 'Green', 'Green', 'Black', 'White', 'Gold', 'Tan']})

I'm trying to get a list of the unique values of the Color column assuming:

a) a non-null value in the Color column, and

b) a value of 'Audi' in the Manufacturer column

Is there a Pythonic way that doesn't require me to 'pre-process' the data by taking a subset of the data frame, as such:

df_1 = df[(df['Color'] != '<null>') & (df['Manufacturer'] == 'Audi')]
df_1['Color'].unique()

array(['Blue', 'Green', 'Black', 'White'], dtype=object)

Thanks in advance!

Answer 1

You have to subset the dataframe with required conditions. There's no escaping that.

You can always write your code in 1-line, like this:

df[(df['Color'] != '<null>') & (df['Manufacturer'].eq('Audi'))]['Color'].unique()

Also, it's nice to represent a null value in dataframe with numpy.nan . Your df would be this:

In [86]: import numpy as np 
In [81]: df = pd.DataFrame({'Manufacturer':[np.nan, 'Mercedes', 'BMW', 'Audi', 'Audi', 'Audi', 'Audi', 'Audi', 'Mercedes', 'BMW'], 
    ...:                           'Color':['Purple', np.nan, np.nan, 'Blue', 'Green', 'Green', 'Black', 'White', 'Gold', 'Tan']})

Then you can use df.notna() and df.eq , which are a bit more pythonic:

In [85]: df[df.Color.notna() & df.Manufacturer.eq('Audi')]['Color'].unique() 
Out[85]: array(['Blue', 'Green', 'Black', 'White'], dtype=object)

After OP's comment:

Can specify multiple values using isin :

df[(df['Color'] != '<null>') & (df['Manufacturer'].isin(['Audi', 'Mercedes']))]['Color'].unique()

A Pythonic way to get unique values in a Pandas data frame column based on multiple filtering criteria

Question

1 answers

solution1
1 ACCPTED 2020-05-27 21:57:57

After OP's comment:

A Pythonic way to get unique values in a Pandas data frame column based on multiple filtering criteria

Question

1 answers

solution1 1 ACCPTED 2020-05-27 21:57:57

After OP's comment:

solution1
1 ACCPTED 2020-05-27 21:57:57