I'm trying to get the unique values in a particular column in a Pandas data frame based on multiple filtering criteria. Here is some toy code:
df = pd.DataFrame({'Manufacturer':['<null', 'Mercedes', 'BMW', 'Audi', 'Audi', 'Audi', 'Audi', 'Audi', 'Mercedes', 'BMW'],
'Color':['Purple', '<null>', '<null>', 'Blue', 'Green', 'Green', 'Black', 'White', 'Gold', 'Tan']})
I'm trying to get a list of the unique values of the Color
column assuming:
a) a non-null value in the Color
column, and
b) a value of 'Audi' in the Manufacturer
column
Is there a Pythonic way that doesn't require me to 'pre-process' the data by taking a subset of the data frame, as such:
df_1 = df[(df['Color'] != '<null>') & (df['Manufacturer'] == 'Audi')]
df_1['Color'].unique()
array(['Blue', 'Green', 'Black', 'White'], dtype=object)
Thanks in advance!
You have to subset the dataframe with required conditions. There's no escaping that.
You can always write your code in 1-line, like this:
df[(df['Color'] != '<null>') & (df['Manufacturer'].eq('Audi'))]['Color'].unique()
Also, it's nice to represent a null
value in dataframe with numpy.nan
. Your df
would be this:
In [86]: import numpy as np
In [81]: df = pd.DataFrame({'Manufacturer':[np.nan, 'Mercedes', 'BMW', 'Audi', 'Audi', 'Audi', 'Audi', 'Audi', 'Mercedes', 'BMW'],
...: 'Color':['Purple', np.nan, np.nan, 'Blue', 'Green', 'Green', 'Black', 'White', 'Gold', 'Tan']})
Then you can use df.notna()
and df.eq
, which are a bit more pythonic:
In [85]: df[df.Color.notna() & df.Manufacturer.eq('Audi')]['Color'].unique()
Out[85]: array(['Blue', 'Green', 'Black', 'White'], dtype=object)
Can specify multiple values using isin
:
df[(df['Color'] != '<null>') & (df['Manufacturer'].isin(['Audi', 'Mercedes']))]['Color'].unique()
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.