Suppose I have the following dataframe:
d = {'col1':['a','b','c','a','a','b','c','c','c','c'],'col2':[0,1,1,0,1,1,1,1,0,1]}
df = pd.DataFrame(d)
for each distinct value in col1
I wish to calculate the percentage, count and length to do this I made a new dataframe and do the operations I mentioned:
df1 = df[df['col1'].isin(["c"])]
Find the percentage of 0/1 for c
:
df1['col2'].value_counts(normalize=True)*100
Find the count of 0/1 for c
:
df1['col2'].value_counts()
and the length for c
:
len(df1)
I wonder how can I do this iteratively for all distinct values a
, b
and c
and make a new dataframe to show all the results instead of each time making a new dataframe as I did in df1
? I know what I am doing now is not the best way to approach this.
Do a groupby:
grouped = df.groupby(['col1'])['col2']
# percentage
grouped.value_counts(normalize=True)
# counts
grouped.value_counts()
# total count
grouped.size()
Try with crosstab
out = pd.crosstab(df['col1'], df['col2'], normalize='index')*100
Out[89]:
col2 0 1
col1
a 66.666667 33.333333
b 0.000000 100.000000
c 20.000000 80.000000
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.