简体   繁体   中英

Why does pandas value_counts() show a count of zero for some values?

I have a dataframe where one column is a categorical variable with the following labels: ['Short', 'Medium', 'Long', 'Very Long', 'Extremely Long'] . I am trying to create a new dataframe that drops all the rows that are Extremely Long .

I have tried doing this in the following ways:

df2 = df.query('ride_type != "Extremely Long"')
df2 = df[df['ride_type'] != 'Extremely Long']

However, when I run.value_counts() I get the following:

df2.ride_type.value_counts()
>>> Short             130474
Long              129701
Medium            129607
Very Long         110988
Extremely Long         0
Name: ride_type, dtype: int64

In other words, Extremely Long is still there, so I can't plot charts with just the four categories I want.

This is a feature of categorical data. You may have something that looks like this:

df = pd.DataFrame({'ride_type': pd.Categorical(
    ['Long', 'Long'], categories=['Long', 'Short'])})

df
  ride_type
0      Long
1      Long

Calling value_counts on a categorical column will record counts for all categories, not just the ones present.

df['ride_type'].value_counts()    

Long     2
Short    0
Name: ride_type, dtype: int64

The solution is to either remove unused categories, or convert to string:

df['ride_type'].cat.remove_unused_categories().value_counts() 

Long    2
Name: ride_type, dtype: int64

# or,
df['ride_type'].astype(str).value_counts() 

Long    2
Name: ride_type, dtype: int64

You could drop rows like this:

df = df.drop(df.index[df['A'] == 'cat'])
print(df['A'].value_counts())

dog       2
rabbit    2
Name: A, dtype: int64

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM