this is really trivial but can't believe I have wandered around for an hour and still can find the answer, so here you are:
df = pd.DataFrame({"cats":["a","b"], "vals":[1,2]})
df.cats = df.cats.astype("category")
df
My problem is how to select the row that its "cats" columns's category is "a". I know that df.loc[df.cats == "a"]
will work but it's based on equality on element. Is there a way to select based on levels of category?
This works:
df.cats[df.cats=='a']
UPDATE
The question was updated. New solution:
df[df.cats.cat.categories == ['a']]
You can query the categorical list using df.cats.cat.categories
which prints output as
Index(['a', 'b'], dtype='object')
For this case, to select a row with category of 'a'
which is df.cats.cat.categories['0']
, you just use:
df[df.cats == df.cats.cat.categories[0]]
df[df.cats.cat.categories == df.cats.cat.categories[0]]
For those who are trying to filter rows based on a numerical categorical column:
df[df['col'] == pd.Interval(46, 53, closed='right')]
This would keep the rows where the col
column has category (46, 53]
.
This kind of categorical column is common when you discretize numerical columns using pd.qcut()
method.
Using the isin
function to create a boolean index is an approach that will extend to multiple categories, similar to R's %in%
operator.
# will return desired subset
df[df.cats.isin(['a'])]
# can be extended to multiple categories
df[df.cats.isin(['a', 'b'])]
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.