I have been using python regex to extract address patterns. For example, i have a list of add as below:
12buixuongtrach
34btrannhatduat
25bachmai
78bhoangquocviet
i want to refine the addresses like these:
12 buixuongtrach
34b trannhatduat
23 bachmai
78b hoangquocviet
Anyone please help some hint code?
Many thanks
You can use a pretty simple regex to split the numbers off from the letters, but like people have said in the comments, there's no way to know when those b's should be part of the number and when they're part of the text.
import re
text = """12buixuongtrach
34btrannhatduat
25bachmai
78bhoangquocviet"""
unmatched = text.split()
matched = [re.sub('(\d+)(.*)', '\\1 \\2', s) for s in unmatched]
Which gives:
>>> matched
['12 buixuongtrach', '34 btrannhatduat', '25 bachmai', '78 bhoangquocviet']
The regex is just grabbing one or more digits at the start of the string and putting them into group \\1
, then putting the rest of the string into group \\2
.
Thanks all for your response. i finally found a work around. I used the pattern as below and it works like a charm :)
'[a-zA-Z]+|[\/0-9abcd]+(?!a|u|c|h|o|e)'
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.