简体   繁体   中英

At column, Count word in comma-separated sentence

Supposed my dataframe is

  Name  Value
0   K   apple,banana
1   Y   banana
2   B   orange,banana
3   Q   grape,apple
4   C   apple,grape

I want to count word in 'Value' column so when I applied like

pd.Series(np.concatenate([x.split() for x in df.Value])).value_counts()

pd.Series(' '.join(df.Value).split()).value_counts()

as output :

apple,banana : 1
banana : 1
orange,banana : 1
grape,apple : 1
apple,grape : 1

but

output what I want

apple : 3
banana : 3
orange : 1
grape : 2 

How can I do this?

Thank you for reading.

Try this instead:

df['Value'].str.split(',', expand=True).stack().value_counts()

Output:

apple     3
banana    3
grape     2
orange    1
dtype: int64

Using the str accessor for pandas then split on ',', stack the columns into the row index and use value_counts .

You could do this, assuming file contains your input:

import pandas as pd
df=pd.read_csv('file',sep='\s+')
itemslist=[i.split(',') for i in df['Value'].tolist()]
allitems=[item for sublist in itemslist for item in sublist]

for fruit in [ele for ind, ele in enumerate(allitems,1) if ele not in allitems[ind:]]:
    print("{} {}".format(fruit,allitems.count(fruit)))

You can approach this in three ways:

  • Either you can isolate the column as a list df['col'].tolist() followed by splitting each item in the list. This will give you a list of lists which you would need to flatten and then use collections.Counter on that list
  • pandas approach would be to isolate this column and expand it using something like this: https://cmdlinetips.com/2018/11/how-to-split-a-text-column-in-pandas/ . This can give you a sparse dataframe with all these words. Then you can iterate through all the columns and perform a value_counts on each of those following a merge of these counts. (Scott Boston's answer)
  • A third and more pythonic way would be to define a new method that can return a Counter dict for each row and assign this in a new column. Once you have the column containing all the dictionary counts, have a method that can merge those dictionaries in a column and update counts.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM