简体   繁体   中英

Why am I not able to get a distribution through Pandas .count()?

I"m not sure why the simple.count() is not working as expected for me.

import pandas as pd
import random
cols = ['die_sum']
df = []

for a in range(100000):
    die_1 = random.randint(1, 6)
    die_2 = random.randint(1, 6)
    die_sum = die_1 + die_2
    df.append(die_sum)

df = pd.DataFrame(df, columns=cols)
df['die_sum'] = df.die_sum.astype(str)
df.groupby('die_sum').count()

output:

输出:

You are running the count() method on an empty dataframe. Specify the column in the grouped dataframe on which you want to apply the count() method.

Changing the last line to the following should solve your issue

>>> df.groupby('die_sum')['die_sum'].count()

die_sum
10     8552
11     5568
12     2780
2      2757
3      5612
4      8444
5     10982
6     13985
7     16570
8     13887
9     10863
Name: die_sum, dtype: int64

You can do

result = df.groupby('die_sum')['die_sum'].count()

Or use .size()

df = pd.DataFrame(df, columns=cols)
result = df.groupby('die_sum').size()


    die_sum
10     8413
11     5672
12     2830
2      2702
3      5640
4      8336
5     11133
6     13943
7     16684
8     13784
9     10863
dtype: int64

value_counts is also an option:

import random

import pandas as pd

cols = ['die_sum']
df = []
random.seed(3)  # Set seed for reproducibility
for a in range(100000):
    die_1 = random.randint(1, 6)
    die_2 = random.randint(1, 6)
    die_sum = die_1 + die_2
    df.append(die_sum)

df = pd.DataFrame(df, columns=cols)

counts = df['die_sum'].value_counts()

print(counts)

counts :

7     16503
6     13809
8     13733
5     11176
9     11118
4      8485
10     8351
11     5631
3      5584
12     2813
2      2797
Name: die_sum, dtype: int64

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM