简体   繁体   English

如何通过python正则表达式排除某些字符来提取

[英]How to extract with excluding some characters by python regex

I have been using python regex to extract address patterns. 我一直在使用python正则表达式来提取地址模式。 For example, i have a list of add as below: 例如,我有一个添加列表如下:

12buixuongtrach 
34btrannhatduat 
25bachmai 
78bhoangquocviet

i want to refine the addresses like these: 我想改进这些地址:

12 buixuongtrach
34b trannhatduat 
23 bachmai 
78b hoangquocviet

Anyone please help some hint code? 有人请帮忙一些提示代码吗?

Many thanks 非常感谢

You can use a pretty simple regex to split the numbers off from the letters, but like people have said in the comments, there's no way to know when those b's should be part of the number and when they're part of the text. 你可以使用一个非常简单的正则表达式来从字母中分割数字,但是就像人们在评论中所说的那样,没有办法知道这些b应该是数字的一部分,何时它们是文本的一部分。

import re
text = """12buixuongtrach 
34btrannhatduat 
25bachmai 
78bhoangquocviet"""

unmatched = text.split()
matched = [re.sub('(\d+)(.*)', '\\1 \\2', s) for s in unmatched]

Which gives: 这使:

>>> matched
['12 buixuongtrach', '34 btrannhatduat', '25 bachmai', '78 bhoangquocviet']

The regex is just grabbing one or more digits at the start of the string and putting them into group \\1 , then putting the rest of the string into group \\2 . 正则表达式只是抓取字符串开头的一个或多个数字并将它们放入组\\1 ,然后将其余的字符串放入组\\2

Thanks all for your response. 谢谢大家的回复。 i finally found a work around. 我终于找到了一个解决方案。 I used the pattern as below and it works like a charm :) 我使用下面的模式,它像一个魅力:)

'[a-zA-Z]+|[\/0-9abcd]+(?!a|u|c|h|o|e)'

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何在 python 中使用正则表达式从字符串中提取一些字符 - How to extract some characters from string using regex in python 使用正则表达式根据不包括换行符的重复模式进行提取 - Using regex to extract based on a recurring pattern excluding newline characters python正则表达式没有以某些字符结尾 - python regex not ending with some characters 如何从python Regex中的给定字符串中提取特定长度的字符 - How to extract characters of particular length from a given string in python Regex 如何使用正则表达式从冒号前的字符串中提取单词并在 python 中排除 \n - How can i extract words from a string before colon and excluding \n from them in python using regex 如何从 python 中的字符串中提取特定文本和一些额外字符? - How to extract specific text and some extra characters from a string in python? 如何通过python正则表达式将数据从指定位置提取到某些行 - How to Extract data from specified position to some lines by python regex python regex命令提取不包含注释行的数据 - python regex command to extract data excluding comment line 如何使用 python 中的正则表达式删除除某些特殊字符外的所有特殊字符 - How to remove all special characters except for some, using regex in python 如何通过正则表达式清除某些字符? - how to clear some characters by regex?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM