简体   繁体   中英

Counting the corresponding value for each distinct elements iteratively

Suppose I have the following dataframe:

d = {'col1':['a','b','c','a','a','b','c','c','c','c'],'col2':[0,1,1,0,1,1,1,1,0,1]}
df = pd.DataFrame(d)

for each distinct value in col1 I wish to calculate the percentage, count and length to do this I made a new dataframe and do the operations I mentioned:

df1 = df[df['col1'].isin(["c"])]

Find the percentage of 0/1 for c :

df1['col2'].value_counts(normalize=True)*100

Find the count of 0/1 for c :

df1['col2'].value_counts()

and the length for c :

len(df1)

I wonder how can I do this iteratively for all distinct values a , b and c and make a new dataframe to show all the results instead of each time making a new dataframe as I did in df1 ? I know what I am doing now is not the best way to approach this.

Do a groupby:

grouped = df.groupby(['col1'])['col2']

# percentage
grouped.value_counts(normalize=True)

# counts
grouped.value_counts()

# total count
grouped.size()

Try with crosstab

out = pd.crosstab(df['col1'], df['col2'], normalize='index')*100
Out[89]: 
col2          0           1
col1                       
a     66.666667   33.333333
b      0.000000  100.000000
c     20.000000   80.000000

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM