![](/img/trans.png)
[英]Replace values from one column by comparing another column to a second DataFrame
[英]Add the string from one dataframe in a new column of a second dataframe while comparing values
我想检查一个 dataframe 中的列中的值是否存在于第二个 dataframe 的列中。 如果存在,则将该值添加到第二个 dataframe 中同一行的新列中。 所有值都是字符串值。 两个数据框的大小都不同。 第二个 dataframe 也有大约 70 万条记录。 所以我拥有的数据框:
DF1
THINGS
book+pen
CAR
chair
laptop
DF2
Description
I want a new book.
I will pen down this things
A quick ride in my new car.
Cars are awesome.
My laptop's memory is bad.
Maybe try sitting on that CHAIR.
我想要的 output 是添加一个“更新”列:
Description Updated
I want a new book. book
I will pen down this things pen
A quick ride in my new car. car
Cars are awesome. car
My laptop's memory is bad. laptop
Maybe try sitting on that CHAIR. chair
Search for that book in my laptop. book+laptop
我已经尝试过蛮力方法,但处理时间太长。 提前致谢!
请试试这个。
str.split
和explode
首先从 df1 获取要匹配的整齐字符串列表str.findall
检索 dfs 之间的匹配字符串。str.strip
括号和引号代码:
df1 = df1.assign(THINGS=df1['THINGS'].str.split('+')).explode('THINGS')
df1['THINGS2'] = df1.THINGS.str.lower()
df2['Description2'] = df2.Description.str.lower()
df2['Updated'] = df2.Description2.str.findall('|'.join(df1.THINGS2))
df2['Updated'] = df2.Updated.astype(str).str.strip(to_strip=r'''[|]|\'''')
del df2['Description2']
print(df2)
印刷:
Description Updated
0 I want a new book. book
1 I will pen down this things pen
2 A quick ride in my new car. car
3 Cars are awesome. car
4 My laptops hangs a lot. laptop
5 Maybe try sitting on that CHAIR. chair
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.