简体   繁体   中英

How do I merge two datasets with on BusinessID and get the final dataset?

It is two datasets business and review files. how to group the multiple reviews on business_id to get all reviews given by the user into one text .

How to merge the datasets with BusinessID and get the final dataset as the picture below?

How can I do this with the Pandas library?

在此处输入图片说明

You can merge df1 (top-left) with a .groupby version of df2 (top-right):

df3 = df1.merge(df2.groupby('Business_id')['Review_text'].apply(list).reset_index(),
               how='left', on='Business_id').rename({'Review_text':'All_reviews'}, axis=1)

Out[1]: 
   Business_id       category  star  Review_count               All_reviews
0            1       shopping   3.5             3  [Text_1, Text_2, Text_4]
1            2     restaurant   5.0             1          [Text_3, Text_5]
2            3  Home services   4.0             6                       NaN

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM