简体   繁体   English

python中的部分字符串匹配

[英]Partial string matching in python

I have an section id A00-A09 .我有一个部分 ID A00-A09 Anything like A01 , A01.01 , A02 till A09.09 should be classified under this section id.A01A01.01A02A09.09类的任何内容都应归入此部分 ID。 How can i do this in Python?我怎样才能在 Python 中做到这一点? At the moment I can only match string with exact character.目前我只能匹配具有精确字符的字符串。

You can use [] with re module:您可以将[]与 re 模块一起使用:

re.findall('A0[0-9].0[0-9]|A0[0-9]','A01')

output:输出:

['A01']

Non occurance:不发生:

re.findall('A0[0-9].0[0-9]|A0[0-9]','A11')

output:输出:

[]

Use re.match() to check this.使用re.match()来检查这一点。 here is an example:这是一个例子:

import re

section_id = "A01.09"
if re.match("^A0[0-9](\.0[0-9])?$", section_id):
    print "yes"

Here the regex means A0X is mandatory, and .0X is optional.这里正则表达式表示A0X是强制性的,而.0X是可选的。 X is from 0-9 . X是从0-9

Cut the section id and compare:剪切部分 id 并进行比较:

sid = "A00-A09"

def under_sid(ssid, sid):
    sid_start, sid_end = sid.split("-")
    return ssid[:3] >= sid_start and ssid[:3] <= sid_end

for i in ["A01", "A01.01", "A02", "A09.09"]:
    assert under_sid(i, sid)

for i in ["B01", "A22.01", "A93", "A19.09"]:
    assert not under_sid(i, sid)

You can do partial matches using startswith() and endswith() .您可以使用startswith()endswith()进行部分匹配。 Assuming the full id is always in a X12.Y34 - each part is a letter and two numbers, separated by .假设完整的 id 总是在X12.Y34 - 每个部分都是一个字母和两个数字,用. or - (or any character):- (或任何字符):

>>> id = 'A03.A07'
>>> section_id = id[:3]
>>> section_id 
'A03'
>>> id.startswith('A03')
True
>>> id.startswith('A07')
False  # so won't match with the subsection.
>>> sub_section_id = id[-3:]
>>> sub_section_id 
'A07'

And you can convert it to uppercase if the input can sometimes be lowercase.如果输入有时可以是小写,您可以将其转换为大写

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM