[英]Regex to ignore specific characters
I am parsing a text on non alphanumeric characters and would like to exclude specific characters like apostrophes, dash/hyphens and commas. 我正在解析非字母数字字符的文本,并希望排除撇号,短划线/连字符和逗号等特定字符。
I would like to build a regex for the following cases: 我想为以下情况构建一个正则表达式:
This is what i have tried: 这是我尝试过的:
def split_text(text):
my_text = re.split('\W',text)
# the following doesn't work.
#my_text = re.split('([A-Z]\w*)',text)
#my_text = re.split("^[a-zA-Z0-9]+(-[a-zA-Z0-9]+)*$",text)
return my_text
Any ideas 有任何想法吗
You can use a negated character class for this: 您可以使用否定的字符类 :
my_text = re.split(r"[^\w'-]+",text)
or 要么
my_text = re.split(r"[^\w,'-]+",text) # also excludes commas
is this what you want? 这是你想要的吗?
non-alphanumeric character, excluding apostrophes and hypens
非字母数字字符,不包括撇号和超量
my_text = re.split(r"[^\w'-]+",text)
non-alphanumeric character, excluding commas,apostrophes and hypens
非字母数字字符,不包括逗号,撇号和超量
my_text = re.split(r"[^\w-',]+",text)
the []
syntax defines a character class, [^..]
"complements" it, ie it negates it. []
语法定义了一个字符类, [^..]
“补充”它,即它否定了它。
See the documentation about that: 请参阅有关该文档的文档 :
Characters that are not within a range can be matched by complementing the set.
可以通过补充该组来匹配不在范围内的字符。 If the first character of the set is
'^'
, all the characters that are not in the set will be matched.如果集合的第一个字符是
'^'
, 则将匹配集合中不包含的所有字符。 For example,[^5]
will match any character except'5'
, and[^^]
will match any character except'^'
.例如,
[^5]
将匹配除'5'
之外'5'
任何字符 ,[^^]
将匹配除'^'
之外'^'
任何字符 。^
has no special meaning if it's not the first character in the set.^
有,如果它不是在集合的第一个字符没有特殊含义。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.