简体   繁体   English

创建一个解决方案,使用 python 将地址自动拆分为单独的组件

[英]Create a solution to automatically split addresses into their separate components using python

I am trying to find a solution for being able to automatically split address into their separate components using python. below is some sample data我正在尝试找到一种解决方案,以便能够使用 python 将地址自动拆分为单独的组件。下面是一些示例数据

Full Address完整地址 Street Number街牌号码 Street街道 City城市 State State Zip Code Zip 代码
661 Camel Back Road Tulsa Oklahoma 74120 661 Camel Back Road 塔尔萨 俄克拉何马州 74120 661 661 Camel Back Road驼背路 Tulsa塔尔萨 Oklahoma俄克拉何马州
68 Gnatty Creek Road Roslyn New York 11576 68 Gnatty Creek 路罗斯林纽约 11576 68 68 Gnatty Creek Road纳蒂溪路 Roslyn罗斯林 New York纽约
1 Raccoon Run Seattle Washington 98119 1 浣熊跑西雅图华盛顿 98119 1 1个 Raccoon Run浣熊跑 Seattle西雅图 Washington华盛顿
616 Friendship Lane Santa Clara California 95054 616 友谊巷圣克拉拉加州 95054 616 616 Friendship Lane友谊巷 Santa Clara圣克拉拉 California加州 95054 95054
3878 Grand Avenue Maitland Florida 32751格兰大道 3878 号梅特兰佛罗里达州 32751 3878 3878 Grand Avenue大道 Maitland梅特兰 Florida佛罗里达 32751 32751

The above data is a representation of what I am trying to achieve.上面的数据代表了我正在努力实现的目标。 on the left is my input address, and on the right is the result after having being split out automatically.左边是我输入的地址,右边是自动拆分后的结果。 The problem here, as cannot be seen in this over simplified example, is that the input addresses don't come in the same order, and will include components such as names of buildings etc.这里的问题,在这个过度简化的示例中看不到,是输入地址的顺序不同,并且将包括建筑物名称等组件。

My options so far are the following:到目前为止,我的选择如下:

  1. REGEX正则表达式
  2. MACHINE LEARNING MODEL机器学习 MODEL

The REGEX option is familiar, but it will still be largely inaccurate. REGEX 选项很熟悉,但在很大程度上仍然不准确。 I need this solution to be as accurate as possible.我需要这个解决方案尽可能准确。

The MACHINE LEARNING MODEL option is more difficult in that I am not aware of any model or framework capable of classifying multiple categories as once.机器学习 MODEL 选项更难,因为我不知道有任何 model 或框架能够将多个类别分类为一次。 Can anyone help?谁能帮忙?

so far I haven't really started the REGEX in anticipation of major gaps in capturing groups.到目前为止,我还没有真正开始 REGEX,因为我预计在捕获组方面存在重大差距。

I think the only way to do this and get a fairly accurate result is to get the list of zip codes, for instance from here: https://www.zipcode.com.ng/2022/06/list-of-5-digit-zip-codes-united-states.html?m=1 and a list of US cities.我认为做到这一点并获得相当准确结果的唯一方法是获取 zip 代码的列表,例如从这里获取: https://www.zipcode.com.ng/2022/06/list-of-5- digit-zip-codes-united-states.html?m=1和美国城市列表。

Then you can match the zip code, state and city to the lists.然后您可以将 zip 代码、state 和城市匹配到列表中。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 使用sklearn创建训练验证拆分 - create training validation split using sklearn python 使用 GroupShuffleSplit 拆分训练/测试/验证 - python split to train/test/val using GroupShuffleSplit 使用 Python 中的列值按比例拆分数据 - Split data in ratio by using column value in Python 使用拆分表与完全独立的表(CreateML,Swift)时,评估准确度不同 - Evaluation Accuracy is Different When Using Split Table Versus Completely Separate Table (CreateML, Swift) 将唯一值拆分为多个列的单独列 - Split unique values into separate columns for multiple columns 如何从 NLP 或任何更好的解决方案的地址列表中找出相似或估计的点名称? - How Can I Find Out Similar or Estimated Point Names From a list of Addresses By NLP Or Any Better Solution? 如何使用python拆分? - How to use python split? 训练/测试Split Python - Train/Test Split Python 如何在使用python scipy.optimize.minimize时确保解决方案是全局最小值 - How to make sure that solution is global minimum while using python scipy.optimize.minimize 如何使用 PyTorch 在自定义图像数据集中创建 train-val 拆分? - How to create a train-val split in custom image datasets using PyTorch?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM