[英]How to extract with excluding some characters by python regex
I have been using python regex to extract address patterns. 我一直在使用python正则表达式来提取地址模式。 For example, i have a list of add as below:
例如,我有一个添加列表如下:
12buixuongtrach
34btrannhatduat
25bachmai
78bhoangquocviet
i want to refine the addresses like these: 我想改进这些地址:
12 buixuongtrach
34b trannhatduat
23 bachmai
78b hoangquocviet
Anyone please help some hint code? 有人请帮忙一些提示代码吗?
Many thanks 非常感谢
You can use a pretty simple regex to split the numbers off from the letters, but like people have said in the comments, there's no way to know when those b's should be part of the number and when they're part of the text. 你可以使用一个非常简单的正则表达式来从字母中分割数字,但是就像人们在评论中所说的那样,没有办法知道这些b应该是数字的一部分,何时它们是文本的一部分。
import re
text = """12buixuongtrach
34btrannhatduat
25bachmai
78bhoangquocviet"""
unmatched = text.split()
matched = [re.sub('(\d+)(.*)', '\\1 \\2', s) for s in unmatched]
Which gives: 这使:
>>> matched
['12 buixuongtrach', '34 btrannhatduat', '25 bachmai', '78 bhoangquocviet']
The regex is just grabbing one or more digits at the start of the string and putting them into group \\1
, then putting the rest of the string into group \\2
. 正则表达式只是抓取字符串开头的一个或多个数字并将它们放入组
\\1
,然后将其余的字符串放入组\\2
。
Thanks all for your response. 谢谢大家的回复。 i finally found a work around.
我终于找到了一个解决方案。 I used the pattern as below and it works like a charm :)
我使用下面的模式,它像一个魅力:)
'[a-zA-Z]+|[\/0-9abcd]+(?!a|u|c|h|o|e)'
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.