简体   繁体   中英

Pandas DataFrame: Find unique words in string column, count their occurrence and sum values in another column on condition

I have the following dataframe:

import pandas as pd

data = {'String': ['foo bar hello world this day', 'foo bar', 'hello bar world'],
        'Value' : [                            10,         2,                 5]}
df = pd.DataFrame(data, columns = ['String', 'Value'])

What I want to know are the unique words, their occurrence and the sum of values when the word occurs in 'String' . So, the desired output is:

Unique word    Occurrence    Value sum
        bar             3           17
      world             2           15
        foo             2           12
      hello             2           15
        day             1           10
       this             1           10

I am able to get the unique words and their occurrence via:

pd.Series(' '.join(df.String).split()).value_counts()

How should I add the value sum?

My version of pandas = 0.24.2

For the accepted answer, the version of pandas should be upgraded to at least 0.25.0

You could do:

df['Unique Word'] = df['String'].str.split()
res = df.drop('String', 1).explode('Unique Word').groupby(['Unique Word'])['Value'].agg(['count', 'sum']).reset_index()
print(res)

Output

  Unique Word  count  sum
0         bar      3   17
1         day      1   10
2         foo      2   12
3       hello      2   15
4        this      1   10
5       world      2   15

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM