熊猫groupby / value_counts无值

Question

I have a data set that shows count of loads for each category. 我有一个数据集，显示每个类别的负载计数。 Given below is the data I have. 以下是我的数据。

Name,Count1,Count2,PercentDiff,Category
Store A,10,4,0.4,Less than 1%
Store B,20,26,1.3,Less than 5%
Store C,12,48,4,Less than 5%
Store D,30,180,6,Less than 10%

I would like to get the count for each of the below categories 我想获得以下各个类别的数量

1. Less than 0
2. Less than 1%
3. Less than 5%
4. Less than 10%
5. More than 10%

I have used the below rule to categorise each of these entries: 我使用以下规则对这些条目进行分类：

new.loc[new['PercentDiff'] < 0, 'Category'] = 'Less than 0%'
new.loc[new['PercentDiff'] == 0, 'Category'] = 'Exact match'
new.loc[new['PercentDiff'] < 0.01, 'Category'] = 'Less than 1%'
new.loc[new['PercentDiff'] < 0.05, 'Category'] = 'Less than 5%'
new.loc[new['PercentDiff'] < 0.1, 'Category'] = 'Less than 10%'
new.loc[new['PercentDiff'] == 0, 'Category'] = 'Exact match'
new.loc[new['PercentDiff'] > 0.1, 'Category'] = 'Greater than 10%'
new['PercentDiff1'] = new['PercentDiff'].astype(int)

Output1 = new.groupby(['Category']).agg(lambda x: x.mad())
Output1 = Output1.replace(np.nan, '', regex=True)
SumMail = pd.value_counts(Output1['Category'].values)

However, if the data set has no values for any of the categories I get an error stating no values found for a particular category. 但是，如果数据集没有任何类别的值，则会收到一条错误消息，指出未找到特定类别的值。

TypeError: 'str' object cannot be interpreted as an integer TypeError：'str'对象不能解释为整数

KeyError: 'More than 10%' KeyError：“超过10％”

Could anyone help me modify this code so that it returns 0 for categories that have no records. 任何人都可以帮助我修改此代码，以便对于没有记录的类别返回0。

Thanks in advance. 提前致谢。

Answer 1

You need to define your 'Category' column astype categorical dtype: 您需要定义“类别”列astype类别dtype：

df['Category'] = df['Category'].astype('category')
df['Category'] = df['Category'].cat.set_categories(['Less than 0',
                                                    'Less than 1%',
                                                    'Less than 5%',
                                                    'Less than 10%',
                                                    'More than 10%'],
                                                    ordered=True)

df['Category'].value_counts(sort=False)

Output: 输出：

Less than 0      0
Less than 1%     1
Less than 5%     2
Less than 10%    1
More than 10%    0
Name: Category, dtype: int64

Answer 2

Check if your dataframe is empty before you do your labeling. 标记之前，请检查数据框是否为空。

if new['PercentDiff'].empty:
    return 0
else:
    new.loc[new['PercentDiff'] < 0, 'Category'] = 'Less than 0%'
    new.loc[new['PercentDiff'] == 0, 'Category'] = 'Exact match'
    new.loc[new['PercentDiff'] < 0.01, 'Category'] = 'Less than 1%'
    new.loc[new['PercentDiff'] < 0.05, 'Category'] = 'Less than 5%'
    new.loc[new['PercentDiff'] < 0.1, 'Category'] = 'Less than 10%'
    new.loc[new['PercentDiff'] == 0, 'Category'] = 'Exact match'
    new.loc[new['PercentDiff'] > 0.1, 'Category'] = 'Greater than 10%'
    new['PercentDiff1'] = new['PercentDiff'].astype(int)

    Output1 = new.groupby(['Category']).agg(lambda x: x.mad())
    Output1 = Output1.replace(np.nan, '', regex=True)
    SumMail = pd.value_counts(Output1['Category'].values)

熊猫groupby / value_counts无值

问题描述

2 个解决方案

解决方案1
0 2018-04-18 19:24:48

解决方案2
0 2018-04-18 19:29:49

熊猫groupby / value_counts无值

问题描述

2 个解决方案

解决方案1 0 2018-04-18 19:24:48

解决方案2 0 2018-04-18 19:29:49

解决方案1
0 2018-04-18 19:24:48

解决方案2
0 2018-04-18 19:29:49