简体   繁体   English

将函数应用于pandas数据框的列

[英]Apply a function to column of pandas dataframe

I have a dataframe with user comments on a movie and would like to parse examples of when a user describes a movie as "movie1" meets "movie2" 我有一个包含用户对电影的评论的数据框,并且想解析用户将电影描述为“ movie1”和“ movie2”时的示例

User id     Old id_New id   Score   Comments
947952018   3101_771355141  3.0 If you want to see a comedy and have a stupid ...
805407067   11903_18330     5.0 Argento?s fever dream masterpiece. Fairy tale ...
901306244   16077_771225176 4.5 Evil Dead II meets Brothers Grimm and Hawkeye ...
901306244   NaN_381422014   1.0 Biggest disappointment! There's a host of ...
15169683    NaN_22471       3.0 You know in the original story of Pinocchio he...

I've written a function that takes in a comment, finds the word "meets" and takes the first n words before and after meets and returns (hopefully) the essence of the titles of movie1 & movie2, which I plan to fuzzy match later to titles in another dataframe. 我编写了一个函数,该函数带有注释,找到单词“ meets”,在遇到之前和之后的前n个单词,然后返回(希望如此)movie1和movie2标题的本质,我计划稍后对其进行模糊匹配到另一个数据框中的标题。

def parse_movie(comment, num_words):
    words = comment.partition('meets')
    words_before = words[0].split(maxsplit=num_words)[-num_words:] 
    words_after = words[2].split(maxsplit=num_words)[:num_words]
    movie1 = ' '.join(words_before)
    movie2 = ' '.join(words_after)
    return movie1, movie2

How can I apply this function on the comments column of the original pandas dataframe and put the returned movie1 and movie2 titles in separate columns? 如何在原始熊猫数据框的注释列上应用此功能,并将返回的movie1和movie2标题放在单独的列中? I tried 我试过了

df['Comments'].apply(parse_titles) 

but then I cannot specify num_words I'd like to use. 但后来我无法指定要使用的num_words个。 Operating directly on the column also doesn't work for me, and I'm not sure how to put the new movies into new columns. 直接在列上操作对我也不起作用,而且我不确定如何将新电影放到新列中。

parse_movie(sample['Comments'], 4)
AttributeError: 'Series' object has no attribute 'partition'

Suggestions would be appreciated! 建议将不胜感激!

Based on how to split column of tuples in pandas dataframe? 基于如何在熊猫数据框中拆分元组列? answer. 回答。 This can be done using lambda function and apply(pd.Series). 这可以使用lambda函数和apply(pd.Series)完成。 Save the results into dataframe column 'movie1' and 'movie2'. 将结果保存到数据框列“ movie1”和“ movie2”中。

num_words = 4
df[['movie1','movie2']] = df['comments'].apply(lambda comment: parse_movie(comment, num_words)).apply(pd.Series)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM