繁体   English   中英

Python-将字符串中的单词与字符串列表匹配

[英]Python - match a word in a string with a list of strings

我是python的新手,我想知道如何进行字符串比较

假设我有一个包含状态名称的字符串列表,例如

states = ["New York", "California", "Nebraska", "Idaho"]

我还有另一个包含如下地址的字符串

postal_addr = "1234 1st E St San Jose California 95112"

如何解析此地址字符串并找到与状态列表中的项目匹配的项目? 在上面的示例中,加利福尼亚将是一个匹配项。 匹配后,如何提取"California"并将其存储为单独的字符串?

>>> states = ["New York", "California", "Nebraska", "Idaho"]
>>> postal_addr = "1234 1st E St San Jose California 95112"
>>> first_match = next(state for state in states if state in postal_addr)
>>> first_match
'California'

但是,如果您需要在单词边界处进行匹配,则最好使用正则表达式。

我会做

matches = [ s for s in states if s in postal_addr ]

然后,如果要从邮政地址获取字符串:

import re
if matches:
    extracted = re.findall( matches[0],  postal_addr)[0]

编辑:..但这不适用于城市名称包含不同州的城市/州组合,例如,如果postal_adr = '1 Arrowhead Dr, Kansas City, Missouri 64129'并且states = ["New York", "California", "Nebraska", "Idaho", "Missouri", "Kansas"]

import re
if matches:
    extracted = [(re.search(m, postal_addr).start() , m) for m in matches ]
    extracted = sorted( extracted )[-1][1]
states = ["New York", "California", "Nebraska", "Idaho"]
postal_addr = "1234 1st E St San Jose California 95112"

result = None
for state in states:
    if state in postal_addr:
        result = state

print(result)

不幸的是,这也将匹配包含州名的单词,例如Idahoba。

这是使用正则表达式的另一个替代答案:

import re

states = ["New York", "California", "Nebraska", "Idaho"]
pattern = re.compile(r'.*(' + r'|'.join(states) + ').*')

postal_addr = "1234 1st E St San Jose California 95112"
match = pattern.match(postal_addr)

if match:
    state = match.group(1)

你可以这样尝试

In [2]: states = ["New York", "California", "Nebraska", "Idaho"]

In [3]: postal_addr = "1234 1st E St San Jose California 95112"

In [4]: ''.join(state for state in states if state in postal_addr)
Out[4]: 'California'

要查找字符串中的所有匹配项,您可以执行以下操作:

matches = [m for m in postal_addr.split() if m in states]

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM