简体   繁体   English

在字符串列表中寻找字符串,并在pandas中创建新列

[英]Looking for a string within a list of strings and creating a new column in pandas

I am new to Python and trying to solve the performance issue here. 我是Python的新手,正在尝试解决此处的性能问题。 I have 2 data frames 我有2个数据框

Dataframe 1 数据框1

col1        col2
holiday     party
party       party
bagel       snack
fruit       snack

Data Frame 2: 数据框2:

col1                            col2
bagel wednesday                 snack               
coffee for party                snack
holiday party                   party

Data Frame 1 has 2 columns. 数据框1有2列。 I need to lookup DataFrame1.col1, in DataFrame2.col1 and create a new column in DataFrame2.col2 with DataFrame1.col2 value Currently, I am achieving this using a loop and it is taking a very long time. 我需要在DataFrame2.col1中查找DataFrame1.col1,并在DataFrame2.col2中创建一个具有DataFrame1.col2值的新列。目前,我正在使用循环来实现此目的,这需要很长时间。 I am looking for an efficient way to do this. 我正在寻找一种有效的方法来做到这一点。 Also, if I get multiple matches I should always go with the first match found from DataFrame1. 另外,如果我有多个匹配项,则应该始终使用从DataFrame1找到的第一个匹配项。 For example, "coffee for party" has 2 matches from DF1, snack and party, in which case "snack" should be picked from DF1.col2. 例如,“聚会用咖啡”在DF1中有2个匹配项,即小吃和聚会,在这种情况下,应从DF1.col2中选择“小吃”。

Thanks RL 谢谢RL

I think you have to loop over the days of the week (but not all the rows of df2 (well, df.col.str.contains will do the inner loop for you in an optimized manner)). 我认为您必须在一周中的每一天进行循环(但不是df2的所有行(好吧,df.col.str.contains会以优化的方式为您做内循环))。

for item in df1.col2.unique():
    for idx, row in df1[df1.col2==item].iterrows():
        df2.loc[df2.col1.str.contains(row.col1), 'col3'] = item

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM