简体   繁体   中英

pandas value_counts (show values and ratio)

As a newbie to pandas, I'm looking to get a count of values from a specific column and percent count into a single frame. I can get one or the other, but can't figure out how to add or merge them into a single frame. Thoughts?

The frame/table should be like this:

some_value, count, count(as %)

Here is what I have...

import numpy as np
import pandas as pd 

np.random.seed(1)
values = np.random.randint(30, 35, 20)

df1 = pd.DataFrame(values, columns=['some_value'])
df1.sort_values(by=['some_value'], inplace = True)
df2 = df1.value_counts()
df3 = df1.value_counts(normalize=True)

print(df2)
print("------")
print(df3) 

Just use

pd.DataFrame({"count":df2,"%":df3*100})

to put the series into one df.

Output:

            count     %
some_value             
34              7  35.0
32              4  20.0
33              3  15.0
31              3  15.0
30              3  15.0

I guess calling value_counts and then normalizing it with a lambda function could be more efficient, but you can get the result you are seeking by doing :

df1_counts = df1.value_counts().to_frame(name="count").merge(
    df1.value_counts(normalize=True).to_frame(name="count(as %)"),
    left_index=True,
    right_index=True,
)

Resulting in :

| some_value | count | count(as %) |
|------------|-------|-------------|
| 34         | 7     | 0.35        |
| 32         | 4     | 0.20        |
| 33         | 3     | 0.15        |
| 31         | 3     | 0.15        |
| 30         | 3     | 0.15        |

Best !

Compute, rename and join. Lets try;

df1.some_value.value_counts().to_frame('count').join(df1.some_value.value_counts(normalize=True).to_frame('%'))



  count   %
34      7  0.35
32      4  0.20
33      3  0.15
31      3  0.15
30      3  0.15

Try this using partial from functools with pd.DataFrame.agg calling a list of functions:

from functools import partial
vc_norm = partial(pd.Series.value_counts, normalize=True)
df1['some_value'].agg([pd.Series.value_counts, vc_norm])

Output:

    value_counts  value_counts
34             7          0.35
32             4          0.20
31             3          0.15
30             3          0.15
33             3          0.15

Or you can use lambda function like this:

df1['some_value'].agg([pd.Series.value_counts, lambda x: x.value_counts(normalize=True)])

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM