繁体 English 中英

使用部分字符串匹配在两个Pandas数据帧之间进行映射/压缩

[英]Mapping/Zipping between two Pandas data frames with a partial string match

原文 2019-04-01 21:20:50 8 1 python/ string/ dataframe/ matching/ fuzzy

我有两个大小约为1,000,000行的数据帧。 两者共享一个共同的“地址”列，我用它来加入数据帧。 使用此连接，我希望将信息（我称之为“详细信息”）从dataframe1移动到dataframe2。

df2.details = df2.Address.map(dict(zip(df1.Address,df1.details)))

但是，地址列不具有完整的通用性。 我尽可能地尝试清洁，但仍然只能移动大约40％的数据。 有没有办法修改我的上面的代码，以允许部分匹配？ 我完全被这个困扰了。

数据非常简单，如描述的那样。 两个小数据帧。 以下制作的样本数据：

df1 
Address                                    Details
Apt 15 A, Long Street, Fake town, US       A   


df2
Address                                    Details
15A, Long Street, Fake town, U.S.

1 个解决方案

首先，我建议执行join操作并识别每个数据框中没有完美匹配的行。 确定这些行后，排除其他行并继续执行以下建议：

一种方法是解析地址并尝试将它们标准化。 您可以尝试使用usaddress模块来标准化您的地址。
您也可以尝试回答这个问题时推荐的方法，尽管他们可能会对您的案例进行一些调整。 如果没有部分字符串匹配的多个示例，很难说。
另一种方法是使用Google Maps API（或Bing或MapQuest）进行地址标准化，但每个数据框有超过一百万行，您将远离去掉每天免费的API调用，并且需要为该服务付费。
最后的建议是使用fuzzywuzzy模块进行模糊（近似）字符串匹配。

python基于部分字符串匹配合并两个pandas数据帧

[英]python merge two pandas data frames based on partial string match

在两个 pandas 数据帧之间找到部分字符串匹配的最快方法

[英]Quickest way to find partial string match between two pandas dataframes

Python Pandas —在两个数据框中映射值

[英]Python Pandas — mapping the values in two data frames

在两个数据框之间减去熊猫

[英]Pandas Subtracting between two Data Frames

跨两个数据帧匹配部分字符串并合并

[英]Match partial strings across two data frames and merge

查找两个数据帧之间的字符串匹配

[英]Find the string matching between two data frames

不同 pandas 数据帧的两列之间的部分字匹配

[英]Partial word match between two columns of different pandas dataframes

如何将 Pandas 中的两个数据帧与多个匹配项匹配？

[英]How Do I match two Data Frames in Pandas with multiple matches?

根据部分字符串匹配创建两个新的 pandas 列

[英]Create two new pandas columns based on partial string match

如何在Python中使用FuzzyWuzzy命名两个数据帧之间的匹配？

[英]How to use FuzzyWuzzy in Python to name match between two data frames?

暂无

暂无

声明:本站的技术帖子网页，遵循CC BY-SA 4.0协议，如果您需要转载，请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 python基于部分字符串匹配合并两个pandas数据帧在两个 pandas 数据帧之间找到部分字符串匹配的最快方法 Python Pandas —在两个数据框中映射值在两个数据框之间减去熊猫跨两个数据帧匹配部分字符串并合并查找两个数据帧之间的字符串匹配不同 pandas 数据帧的两列之间的部分字匹配如何将 Pandas 中的两个数据帧与多个匹配项匹配？根据部分字符串匹配创建两个新的 pandas 列如何在Python中使用FuzzyWuzzy命名两个数据帧之间的匹配？

相关标签

粤ICP备18138465号 © 2020-2024 STACKOOM.COM