I have the following dataframe:
import pandas as pd
data = {'String': ['foo bar hello world this day', 'foo bar', 'hello bar world'],
'Value' : [ 10, 2, 5]}
df = pd.DataFrame(data, columns = ['String', 'Value'])
What I want to know are the unique words, their occurrence and the sum of values when the word occurs in 'String'
. So, the desired output is:
Unique word Occurrence Value sum
bar 3 17
world 2 15
foo 2 12
hello 2 15
day 1 10
this 1 10
I am able to get the unique words and their occurrence via:
pd.Series(' '.join(df.String).split()).value_counts()
How should I add the value sum?
My version of pandas = 0.24.2
For the accepted answer, the version of pandas should be upgraded to at least 0.25.0
You could do:
df['Unique Word'] = df['String'].str.split()
res = df.drop('String', 1).explode('Unique Word').groupby(['Unique Word'])['Value'].agg(['count', 'sum']).reset_index()
print(res)
Output
Unique Word count sum
0 bar 3 17
1 day 1 10
2 foo 2 12
3 hello 2 15
4 this 1 10
5 world 2 15
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.