简体   繁体   中英

How to select rows based categories in Pandas dataframe

this is really trivial but can't believe I have wandered around for an hour and still can find the answer, so here you are:

    df = pd.DataFrame({"cats":["a","b"], "vals":[1,2]})
    df.cats = df.cats.astype("category")
    df

df 看起来像这样

My problem is how to select the row that its "cats" columns's category is "a". I know that df.loc[df.cats == "a"] will work but it's based on equality on element. Is there a way to select based on levels of category?

This works:

df.cats[df.cats=='a']

UPDATE

The question was updated. New solution:

df[df.cats.cat.categories == ['a']]

You can query the categorical list using df.cats.cat.categories which prints output as

Index(['a', 'b'], dtype='object')

For this case, to select a row with category of 'a' which is df.cats.cat.categories['0'] , you just use:

df[df.cats == df.cats.cat.categories[0]]
df[df.cats.cat.categories == df.cats.cat.categories[0]]

For those who are trying to filter rows based on a numerical categorical column:

df[df['col'] == pd.Interval(46, 53, closed='right')]

This would keep the rows where the col column has category (46, 53] .

This kind of categorical column is common when you discretize numerical columns using pd.qcut() method.

Using the isin function to create a boolean index is an approach that will extend to multiple categories, similar to R's %in% operator.

# will return desired subset
df[df.cats.isin(['a'])]

# can be extended to multiple categories
df[df.cats.isin(['a', 'b'])]

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM