如何检查熊猫数据帧列中的子字符串是否存在于同一数据帧中另一列的子字符串中？

Question

I have a dataframe with columns like this:我有一个包含如下列的数据框：

  A                               B
0  - 5923FoxRd                    5923 Fox Rd
1 631 Newhaven Ave                Modesto
2 Saratoga Street, Suite 200      Saratoga Street, Suite 200

I want to create a list with values from A that matches values from B. The list should look like [- 5923FoxRd, Saratoga Street, Suite 200...].我想创建一个列表，其中包含与 B 中的值匹配的 A 值。该列表应类似于 [- 5923FoxRd, Saratoga Street, Suite 200...]。 What is the easiest way to do this?什么是最简单的方法来做到这一点？

Answer 1

To make a little go a long way, do the following:要使一点点走很长的路，请执行以下操作：

Create a new series for each column and pass the regex pattern \\W+ to str.replace()为每一列创建一个新系列并将正则表达式模式\\W+传递给str.replace()
use str.lower()使用str.lower()
create replace lists to normalize drive to dr , avenue to ave , etc.创建替换列表以将drive规范化为dr ， avenue为ave等。

s1 = df['A'].str.replace('\W+', '').str.lower()
s2 = df['B'].str.replace('\W+', '').str.lower()
lst = [*df[s1==s2]['A']]
lst
Out[1]: ['- 5923FoxRd', 'Saratoga Street, Suite 200']

This is what s1 and s2 look like:这是s1和s2样子：

print(s1,s2)

0                 5923foxrd
1            631newhavenave
2    saratogastreetsuite200
Name: A, dtype: object

0                 5923foxrd
1                   modesto
2    saratogastreetsuite200
Name: B, dtype: object

From there, you might want to create some replace values in order to normalize your data even further like:从那里，您可能想要创建一些替换值以进一步规范化您的数据，例如：

to_replace = ['drive', 'avenue', 'street']
replaced = ['dr', 'ave', 'str']

to_replace = ['drive', 'avenue', 'street']
replaced = ['dr', 'ave', 'str']
s1 = df['A'].str.replace('\W+', '').str.lower().replace(to_replace, replaced, regex=True)
s2 = df['B'].str.replace('\W+', '').str.lower().replace(to_replace, replaced, regex=True)
lst = [*df[s1==s2]['A']]
lst
print(s1,s2)
0              5923foxrd
1         631newhavenave
2    saratogastrsuite200
Name: A, dtype: object

0              5923foxrd
1                modesto
2    saratogastrsuite200
Name: B, dtype: object

如何检查熊猫数据帧列中的子字符串是否存在于同一数据帧中另一列的子字符串中？

问题描述

1 个解决方案

解决方案1
1 已采纳 2020-10-01 01:26:03

如何检查熊猫数据帧列中的子字符串是否存在于同一数据帧中另一列的子字符串中？

问题描述

1 个解决方案

解决方案1 1 已采纳 2020-10-01 01:26:03

解决方案1
1 已采纳 2020-10-01 01:26:03