[英]Regex to match first occurrence of non alpha-numeric characters
I am parsing some user input to make a basic Discord bot assigning roles and such.我正在解析一些用户输入以制作基本的 Discord 机器人分配角色等。 I am trying to generalize some code to reuse for different similar tasks (doing similar things in different categories/channels).
我试图概括一些代码以重复用于不同的类似任务(在不同的类别/渠道中做类似的事情)。
Generally, I am looking for a substring (the category), then taking the string after as that categories value.通常,我正在寻找 substring(类别),然后将后面的字符串作为该类别值。 I am looking line by line for my category, replacing the "category" substring and returning a stripped version.
我正在逐行查找我的类别,替换“类别”substring 并返回剥离版本。 However, what I have now also replaces any space in the "value" string.
但是,我现在所拥有的也替换了“值”字符串中的任何空格。
Originally the string looks like this:最初字符串看起来像这样:
Gamertag : 00test gamertag
What I want to do, is preserve the spaces in the value.我想要做的是保留值中的空格。 The regex I am trying to do is: match all non alpha-numeric chars until the first letter.
我正在尝试做的正则表达式是:匹配所有非字母数字字符,直到第一个字母。
My return is already matching non alpha but can't figure out how to get just first group, looks like it should be simply adding a?我的回报已经匹配非 alpha 但无法弄清楚如何获得第一组,看起来应该简单地添加一个? to make it a lazy operator but not sure.. example code and string below (regex I want to replace is the final print string).
使其成为一个惰性运算符但不确定.. 下面的示例代码和字符串(我要替换的正则表达式是最终的打印字符串)。
String I am working with:我正在使用的字符串:
- 00test Gamertag #(or any non-alpha delimiter)
Desired Results (by matching and stripping the extra characters)期望的结果(通过匹配和去除多余的字符)
00test Gamertag #(remove leading space and any non-alpha characters before the first words)
The regex I am trying to do is: match all non alpha-numeric chars until the first letter.我正在尝试做的正则表达式是:匹配所有非字母数字字符,直到第一个字母。 Should be something like the following, which is close to what I use to strip non-alphas now but it does all not the first group - so I want to match the first group of non-alphas in a string to strip that part using re.sub..
应该类似于以下内容,这与我现在用来去除非字母的方法很接近,但它不是第一组 - 所以我想匹配字符串中的第一组非字母以使用 re 去除该部分.sub..
\W+?
https://www.online-python.com/gDVhZrnmlq https://www.online-python.com/gDVhZrnmlq
Thank you!谢谢!
It depends on your inputs, you can use two regex to achieve your goal, the first to remove all non alpha-numeric from your string including the ones between words, and the second one to remove whitespaces between words if there is more than one space between each two words:这取决于您的输入,您可以使用两个正则表达式来实现您的目标,第一个从字符串中删除所有非字母数字,包括单词之间的那些,第二个删除单词之间的空格(如果有多个空格)每两个词之间:
import re
gamer_tag = "µ& - 00test - Gamertag"
gamer_tag = re.sub(r"[^a-zA-Z0-9\s]", "", gamer_tag)
gamer_tag = re.sub(r" +", " ", gamer_tag)
print(gamer_tag.strip())
# Output: 00test Gamertag
You can remove the second re.sub()
if you sure that there will no more than one space between words.如果您确定单词之间的空格不超过一个,则可以删除第二个
re.sub()
。
gamer_tag = "- 00test Gamertag "
gamer_tag = re.sub(r"[^a-zA-Z0-9\s]", "", gamer_tag)
print(gamer_tag.strip())
# Output: 00test Gamertag
Your regex will substitute the non-alphanumerical characters anywhere in the input string.您的正则表达式将替换输入字符串中任何位置的非字母数字字符。 If you only need to have this happening at the start of the string, then use the start-of-input anchor (ie
^
):如果您只需要在字符串的开头发生这种情况,请使用输入开始锚点(即
^
):
^\W+
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.