简体   繁体   中英

Python Pandas: checking value of one column into column of another dataframe

I have two data frames which looks like following:

df:

         Review Text                                        Noun                                             Thumbups   Rating
    I've been using this app for over a month. It ...   [app, month, job, track, ATV, replay, animatio...         2.0   4
    Would be nice to be able to import files from ...   [My, Tracks, app, phone, Google, Drive, import...         6.0   5
    When screen off it shows a straight line. Not ...   [screen, line, route]                                     1.0   3
    No Offline Maps! It used to have offline maps ...   [Offline, Maps, menu, option, video, exchange,...         20.0  1
    Great application. Designed with very well tho...   [application, application]                                20.0  5
    Great App. Nice and simple but accurate. Wish ...   [Great, App, Nice, Exported]                                0.0 5
    Does just what it says. Had a couple of questi...   [couple, service]                                         0.0   5
    Save For Offline - This does not work. The rou...   [Save, Offline, route, filesystem]                       12.0   1
    Since latest update app will not run. Subscrip...   [update, app, Subscription, March, application]           9.0   5
    Great app. Love it! And all the things it does...   [Great, app, Thank, work]                                1.0    5
    I have paid for subscription but keeps telling...   [subscription, trial, period]                            0.0    2
    Error: The route cannot be save for no locatio...   [Error, route, i, GPS]                                   0.0    2

df1:

Noun    Thumb_count
accuracy    1.0
almost      1.0
animation   2.0
antarctica  1.0
app         25.0
application 29.0
apps        1.0
atv         2.0
august      3.0
battery     1.0

I want to check if the value of column 'Noun' of df1 present in 'Noun' column of df, then create a new column in df1 with name 'average' and take the average of 'Rating' column of df rows where the Noun value present.

I started with comparing two columns of dataframe by using following code:

df['Noun'].isin(set(df1['Noun']))

However, I got TypeError and System Error: Following are the error:

TypeError: unhashable type: 'list'
SystemError: <built-in method view of numpy.ndarray object at 0x7ff6313e3df0> returned a result with an error set

Could anyone help me where am I making the mistake?

A sample output would have been very useful. In its absence, my attempt;

df.Noun=df.Noun.str.strip('[]')#Strip corner brackets
df.Noun=df.Noun.str.split(",")#Make list again.
df=df.explode('Noun')#Get each item in df.Noun 
df[df.Noun.str.contains(('|').join(df1.Noun.values.tolist()))]#Check membership
df.groupby('Noun')['Rating'].mean()

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM