简体   繁体   中英

Create a new Dataframe that counts positive and negative tweets for each user

i have the following DataFrame:

在此处输入图像描述

it contains user_ids, tweets, location and the classification of the tweet as negative and positive.

i want to create a new dataframe that groups by user id, as each user has more than one tweet in the dataframe. the dataframe should contain the following columns:

  1. user_id
  2. count of negative tweets by that user_id
  3. count of positive tweets by that user_id
  4. location of the user

required sample output

user_id             positive_tweets   negative_tweets    Location
418                 1                    0                   CA
521                 1                    0                   CA
997                 0                    1                   LA
1135                1                    0                   LA

this code was suggested by Mr. BlackFox for my previous question that i didn't ask correctly.

df.groupby(['user_id','classification'])['user_id'].count()

however, it does not match the required output.

Thanks

I hope that's what you are looking for.

df.groupby(['user_id', 'Location']).apply(lambda x: pd.Series(dict(
positive_tweets=(x.classification == 'positive').sum(),
negative_tweets=(x.classification == 'negative').sum(),
)))

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM