[英]Python - match a word in a string with a list of strings
我是python的新手,我想知道如何进行字符串比较
假设我有一个包含状态名称的字符串列表,例如
states = ["New York", "California", "Nebraska", "Idaho"]
我还有另一个包含如下地址的字符串
postal_addr = "1234 1st E St San Jose California 95112"
如何解析此地址字符串并找到与状态列表中的项目匹配的项目? 在上面的示例中,加利福尼亚将是一个匹配项。 匹配后,如何提取"California"
并将其存储为单独的字符串?
>>> states = ["New York", "California", "Nebraska", "Idaho"]
>>> postal_addr = "1234 1st E St San Jose California 95112"
>>> first_match = next(state for state in states if state in postal_addr)
>>> first_match
'California'
但是,如果您需要在单词边界处进行匹配,则最好使用正则表达式。
我会做
matches = [ s for s in states if s in postal_addr ]
然后,如果要从邮政地址获取字符串:
import re
if matches:
extracted = re.findall( matches[0], postal_addr)[0]
编辑:..但这不适用于城市名称包含不同州的城市/州组合,例如,如果postal_adr = '1 Arrowhead Dr, Kansas City, Missouri 64129'
并且states = ["New York", "California", "Nebraska", "Idaho", "Missouri", "Kansas"]
等
import re
if matches:
extracted = [(re.search(m, postal_addr).start() , m) for m in matches ]
extracted = sorted( extracted )[-1][1]
states = ["New York", "California", "Nebraska", "Idaho"]
postal_addr = "1234 1st E St San Jose California 95112"
result = None
for state in states:
if state in postal_addr:
result = state
print(result)
不幸的是,这也将匹配包含州名的单词,例如Idahoba。
这是使用正则表达式的另一个替代答案:
import re
states = ["New York", "California", "Nebraska", "Idaho"]
pattern = re.compile(r'.*(' + r'|'.join(states) + ').*')
postal_addr = "1234 1st E St San Jose California 95112"
match = pattern.match(postal_addr)
if match:
state = match.group(1)
你可以这样尝试
In [2]: states = ["New York", "California", "Nebraska", "Idaho"]
In [3]: postal_addr = "1234 1st E St San Jose California 95112"
In [4]: ''.join(state for state in states if state in postal_addr)
Out[4]: 'California'
要查找字符串中的所有匹配项,您可以执行以下操作:
matches = [m for m in postal_addr.split() if m in states]
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.