简体   繁体   中英

How to extract with excluding some characters by python regex

I have been using python regex to extract address patterns. For example, i have a list of add as below:

12buixuongtrach 
34btrannhatduat 
25bachmai 
78bhoangquocviet

i want to refine the addresses like these:

12 buixuongtrach
34b trannhatduat 
23 bachmai 
78b hoangquocviet

Anyone please help some hint code?

Many thanks

You can use a pretty simple regex to split the numbers off from the letters, but like people have said in the comments, there's no way to know when those b's should be part of the number and when they're part of the text.

import re
text = """12buixuongtrach 
34btrannhatduat 
25bachmai 
78bhoangquocviet"""

unmatched = text.split()
matched = [re.sub('(\d+)(.*)', '\\1 \\2', s) for s in unmatched]

Which gives:

>>> matched
['12 buixuongtrach', '34 btrannhatduat', '25 bachmai', '78 bhoangquocviet']

The regex is just grabbing one or more digits at the start of the string and putting them into group \\1 , then putting the rest of the string into group \\2 .

Thanks all for your response. i finally found a work around. I used the pattern as below and it works like a charm :)

'[a-zA-Z]+|[\/0-9abcd]+(?!a|u|c|h|o|e)'

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM