簡體   English   中英

分組正則表達式python的最佳實踐

[英]Best practice of grouping regex python

我有一個包含Python中任意電話號碼的字符串列表。 擴展名是可選部分。

st = ['(800) 555-1212',
'1-800-555-1212',
'800-555-1212x1234',
'800-555-1212 ext. 1234',
'work 1-(800) 555.1212 #1234']

我的目標是隔離電話號碼,以便我可以隔離每個單獨的組。 '800','555','1212'和可選的'1234'。

我已經嘗試了以下代碼。

p1 = re.compile(r'(\d{3}).*(\d{3}).*(\d{4}).*(\d{4})?')
step1 = [re.sub(r'\D','',p1.search(t).group()) for t in st]
p2 = re.compile(r'(\d{3})(\d{3})(\d{4})(\d{4})?')
step2 = [p2.search(t).groups() for t in step1]

p1和p2是獲取所需輸出的兩種模式。

for i in range(len(step2)):
print step2[i]

輸出為:

('800', '555', '1212', None)
('800', '555', '1212', None)
('800', '555', '1212', '1234')
('800', '555', '1212', '1234')
('800', '555', '1212', '1234')

因為我是新手,所以如果有更好的方法解決此類問題或Python社區遵循的一些最佳做法,我希望得到建議。 提前致謝。

我認為re.findall和這些組的相似性可以為您提供一種更簡單的方法:

>>> import re
>>> from pprint import pprint
>>> res = [re.findall(r'\d{3,4}', s) for s in st]
>>> pprint res
[['800', '555', '1212'],
 ['800', '555', '1212'],
 ['800', '555', '1212', '1234'],
 ['800', '555', '1212', '1234'],
 ['800', '555', '1212', '1234']]

不必嘗試匹配整個字符串並捕獲所需的子字符串,您只需將數字與長度3或4匹配即可。

Regex101上的演示: https ://regex101.com/r/XNbb79/1

import re

st = ['(800) 555-1212',
'1-800-555-1212',
'800-555-1212x1234',
'800-555-1212 ext. 1234',
'work 1-(800) 555.1212 #1234']

for b in [re.findall('\d{3,4}', a) for a in st]:
    if len(b) == 3:
        print "number does not have extension"
        print b
    if len(b) == 4:
        print "number has extension"
        print b

輸出:

number does not have extension
['800', '555', '1212']
number does not have extension
['800', '555', '1212']
number has extension
['800', '555', '1212', '1234']
number has extension
['800', '555', '1212', '1234']
number has extension
['800', '555', '1212', '1234']

另一項(您的修改):

import re
pattern = re.compile('.*(\d{3})[^\d]*(\d{3})[^\d]*(\d{4})[^\d]*(\d{4})?$')
print [[pattern.match(s).group(i) for i in range(1,5)] for s in st]

#[['800', '555', '1212', None], ['800', '555', '1212', None], ['800', '555', '1212', '1234'], ['800', '555', '1212', '1234'], ['800', '555', '1212', '1234']]

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM