简体   繁体   中英

how to calculate value counts when we have more than one value in a colum in pandas dataframe

df,

Name
Sri
Sri,Ram
Sri,Ram,kumar
Ram

I am trying to calculate the value counts for each value. I am not getting my output when using

 df["Name"].values_count()

my desired output is,

 Sri     3
 Ram     3
 Kumar   1

split the column, stack to long format, then count :

df.Name.str.split(',', expand=True).stack().value_counts()

#Sri      3
#Ram      3
#kumar    1
#dtype: int64

Or maybe:

df.Name.str.get_dummies(',').sum()

#Ram      3
#Sri      3
#kumar    1
#dtype: int64

Or concatenate before value_counts :

pd.value_counts(pd.np.concatenate(df.Name.str.split(',')))

#Sri      3
#Ram      3
#kumar    1
#dtype: int64

Timing :

%timeit df.Name.str.split(',', expand=True).stack().value_counts()
#1000 loops, best of 3: 1.02 ms per loop

%timeit df.Name.str.get_dummies(',').sum()
#1000 loops, best of 3: 1.18 ms per loop

%timeit pd.value_counts(pd.np.concatenate(df.Name.str.split(',')))
#1000 loops, best of 3: 573 µs per loop

# option from @Bharathshetty 
from collections import Counter
%timeit pd.Series(Counter((df['Name'].str.strip() + ',').sum().rstrip(',').split(',')))
# 1000 loops, best of 3: 498 µs per loop

# option inspired by @Bharathshetty 
%timeit pd.value_counts(df.Name.str.cat(sep=',').split(','))
# 1000 loops, best of 3: 483 µs per loop

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM