简体   繁体   English

iOS中的复杂模糊字符串匹配

[英]Complex fuzzy string matching in iOS

I'm writing an iOS application that pulls events from a public Google calendar, pulls out the free-form "Location" field, and drops a pin on a map corresponding to the given location. 我正在编写一个iOS应用程序,该程序从公共Google日历中提取事件,提取自由格式的“位置”字段,并在对应于给定位置的地图上放置图钉。 I want to make the app as flexible as possible using some kind of string search or fuzzy matching algorithms, but I'm not sure where to begin. 我想使用某种字符串搜索或模糊匹配算法使该应用程序尽可能地灵活,但是我不确定从哪里开始。

There are several things a calendar moderator may enter into the Location field: 日历主持人可能会在“位置”字段中输入以下内容:

  • A building name and room number (eg Foo Hall Room 123) 建筑物名称和房间号(例如,Foo Hall 123室)
  • A building abbreviation and room number (eg FOO 123) 建筑物的缩写和房间号(例如FOO 123)
  • A shorthand room or location name (eg Foo) 速记室或位置名称(例如Foo)

Currently, I have a sqlite database composed of one table, each row storing a latitude, longitude, full building name (Foo Hall), and standardized building abbreviation (FOO). 目前,我有一个由一个表组成的sqlite数据库,每行存储一个纬度,经度,完整的建筑物名称(Foo Hall)和标准化的建筑物缩写(FOO)。

I want to take the moderator's free-form string and obtain the correct coordinates from the database (if present). 我想获取主持人的自由格式字符串,并从数据库(如果存在)中获取正确的坐标。

I've tried using LIKE '%FOO%' and similar patterns, as well as Levenshtein Distance, but I run into issues, for instance if the actual building name is "Example Foo and Bar Building" and the location entered by moderator is "Example Bar Building". 我尝试使用LIKE '%FOO%'和类似的模式以及Levenshtein距离,但是遇到了问题,例如,如果实际的建筑物名称是“ Example Foo and Bar Building”,主持人输入的位置是“吧台建设示例”。

The three options I've considered are... 我考虑的三个选项是...

  • Force the moderator to enter in a standardized abbreviation or building name. 强制主持人输入标准化的缩写或建筑物名称。 This could potentially be a tedious process for the calendar moderators, so I'm trying to avoid this if possible. 对于日历版主来说,这可能是一个单调乏味的过程,因此,我尝试尽可能避免这种情况。

  • Do a crude substring search that checks if the entered string is contained anywhere in the database string. 进行粗略的子字符串搜索,以检查输入的字符串是否包含在数据库字符串中的任何位置。 This is what my university does on their website, but it obviously isn't very flexible. 这是我的大学在他们的网站上所做的,但是显然不是很灵活。

  • Implement a more complex fuzzy string matching algorithm that provides maximum flexibility but will take an order of magnitude more time to implement. 实现更复杂的模糊字符串匹配算法,该算法可提供最大的灵活性,但要花更多的时间才能实现。 If the right one already exists, that would be the ideal solution!! 如果已经存在正确的解决方案,那将是理想的解决方案!!

Which of these options (if any) seems the best? 这些选项中的哪一个(如果有)似乎最好? Is there a better alternative that I haven't thought of? 有没有我没想到的更好的选择? Is there a library that does what I need and I just haven't found it yet? 是否有一个图书馆可以满足我的需求,而我还没有找到它?

Thanks in advance for any help! 在此先感谢您的帮助!

I'm not an iOS dev, so I can't be much help, but if you do have to implement your own solution there are several versatile Python libraries that you could work off of, such as fuzzywuzzy . 我不是iOS开发人员,所以我帮不上什么忙,但是如果您必须实施自己的解决方案,则可以使用多个通用Python库,例如Fuzzywuzzy Good luck! 祝好运!

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM