如何通过python正则表达式排除某些字符来提取

Question

I have been using python regex to extract address patterns. 我一直在使用python正则表达式来提取地址模式。 For example, i have a list of add as below: 例如，我有一个添加列表如下：

12buixuongtrach 
34btrannhatduat 
25bachmai 
78bhoangquocviet

i want to refine the addresses like these: 我想改进这些地址：

12 buixuongtrach
34b trannhatduat 
23 bachmai 
78b hoangquocviet

Anyone please help some hint code? 有人请帮忙一些提示代码吗？

Many thanks 非常感谢

Answer 1

You can use a pretty simple regex to split the numbers off from the letters, but like people have said in the comments, there's no way to know when those b's should be part of the number and when they're part of the text. 你可以使用一个非常简单的正则表达式来从字母中分割数字，但是就像人们在评论中所说的那样，没有办法知道这些b应该是数字的一部分，何时它们是文本的一部分。

import re
text = """12buixuongtrach 
34btrannhatduat 
25bachmai 
78bhoangquocviet"""

unmatched = text.split()
matched = [re.sub('(\d+)(.*)', '\\1 \\2', s) for s in unmatched]

Which gives: 这使：

>>> matched
['12 buixuongtrach', '34 btrannhatduat', '25 bachmai', '78 bhoangquocviet']

The regex is just grabbing one or more digits at the start of the string and putting them into group \\1 , then putting the rest of the string into group \\2 . 正则表达式只是抓取字符串开头的一个或多个数字并将它们放入组\\1 ，然后将其余的字符串放入组\\2 。

Answer 2

Thanks all for your response. 谢谢大家的回复。 i finally found a work around. 我终于找到了一个解决方案。 I used the pattern as below and it works like a charm :) 我使用下面的模式，它像一个魅力:)

'[a-zA-Z]+|[\/0-9abcd]+(?!a|u|c|h|o|e)'

如何通过python正则表达式排除某些字符来提取

问题描述

2 个解决方案

解决方案1
3 2012-11-20 06:29:36

解决方案2
0 2012-11-20 08:06:04

如何通过python正则表达式排除某些字符来提取

问题描述

2 个解决方案

解决方案1 3 2012-11-20 06:29:36

解决方案2 0 2012-11-20 08:06:04

解决方案1
3 2012-11-20 06:29:36

解决方案2
0 2012-11-20 08:06:04