简体   繁体   中英

apply pandas qcut function to subgroups

Let us assume we created a dataframe df using the code below. I have created a bin frequency count based on the 'value' column in df. Now how do I get the frequency count of these label=1 samples frequency count based on previous created bin? Obviously, I should not use qcut for those label = 1 samples to get the count, since the bin positions are not same as before.

import numpy as np
import pandas as pd
mu, sigma = 0, 0.1
theta = 0.3
s = np.random.normal(mu, sigma, 100)
group = np.random.binomial(1, theta, 100)
df = pd.DataFrame(np.vstack([s,group]).transpose())
df.columns = ['value','label']
factor = pd.qcut(df['value'], 5)
factor_bin_count = pd.value_counts(factor)

Update: I took the solution from jeff

df.groupby(['label',factor]).value.count()

If I understand your question. You want to take a grouping factor (eg you created using qcut to bin the continuous values), and another grouper (eg 'label'), then perform an operation. count in this case.

In [36]: df.groupby(['label',factor]).value.count()
Out[36]: 
label  value             
0      [-0.248, -0.0864]     14
       (-0.0864, -0.0227]    15
       (-0.0227, 0.0208]     15
       (0.0208, 0.0718]      17
       (0.0718, 0.24]        13
1      [-0.248, -0.0864]      6
       (-0.0864, -0.0227]     5
       (-0.0227, 0.0208]      5
       (0.0208, 0.0718]       3
       (0.0718, 0.24]         7
Name: value, dtype: int64

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM