简体   繁体   English

匹配首次出现的非字母数字字符的正则表达式

[英]Regex to match first occurrence of non alpha-numeric characters

I am parsing some user input to make a basic Discord bot assigning roles and such.我正在解析一些用户输入以制作基本的 Discord 机器人分配角色等。 I am trying to generalize some code to reuse for different similar tasks (doing similar things in different categories/channels).我试图概括一些代码以重复用于不同的类似任务(在不同的类别/渠道中做类似的事情)。

Generally, I am looking for a substring (the category), then taking the string after as that categories value.通常,我正在寻找 substring(类别),然后将后面的字符串作为该类别值。 I am looking line by line for my category, replacing the "category" substring and returning a stripped version.我正在逐行查找我的类别,替换“类别”substring 并返回剥离版本。 However, what I have now also replaces any space in the "value" string.但是,我现在所拥有的也替换了“值”字符串中的任何空格。

Originally the string looks like this:最初字符串看起来像这样:

Gamertag : 00test gamertag

What I want to do, is preserve the spaces in the value.我想要做的是保留值中的空格。 The regex I am trying to do is: match all non alpha-numeric chars until the first letter.我正在尝试做的正则表达式是:匹配所有非字母数字字符,直到第一个字母。

My return is already matching non alpha but can't figure out how to get just first group, looks like it should be simply adding a?我的回报已经匹配非 alpha 但无法弄清楚如何获得第一组,看起来应该简单地添加一个? to make it a lazy operator but not sure.. example code and string below (regex I want to replace is the final print string).使其成为一个惰性运算符但不确定.. 下面的示例代码和字符串(我要替换的正则表达式是最终的打印字符串)。

String I am working with:我正在使用的字符串:

- 00test Gamertag      #(or any non-alpha delimiter)

Desired Results (by matching and stripping the extra characters)期望的结果(通过匹配和去除多余的字符)

00test Gamertag     #(remove leading space and any non-alpha characters before the first words)

The regex I am trying to do is: match all non alpha-numeric chars until the first letter.我正在尝试做的正则表达式是:匹配所有非字母数字字符,直到第一个字母。 Should be something like the following, which is close to what I use to strip non-alphas now but it does all not the first group - so I want to match the first group of non-alphas in a string to strip that part using re.sub..应该类似于以下内容,这与我现在用来去除非字母的方法很接近,但它不是第一组 - 所以我想匹配字符串中的第一组非字母以使用 re 去除该部分.sub..

\W+?

https://www.online-python.com/gDVhZrnmlq https://www.online-python.com/gDVhZrnmlq

Thank you!谢谢!

It depends on your inputs, you can use two regex to achieve your goal, the first to remove all non alpha-numeric from your string including the ones between words, and the second one to remove whitespaces between words if there is more than one space between each two words:这取决于您的输入,您可以使用两个正则表达式来实现您的目标,第一个从字符串中删除所有非字母数字,包括单词之间的那些,第二个删除单词之间的空格(如果有多个空格)每两个词之间:

import re


gamer_tag = "µ& - 00test          -   Gamertag"
gamer_tag = re.sub(r"[^a-zA-Z0-9\s]", "", gamer_tag)
gamer_tag = re.sub(r" +", " ", gamer_tag)
print(gamer_tag.strip())

# Output: 00test Gamertag

You can remove the second re.sub() if you sure that there will no more than one space between words.如果您确定单词之间的空格不超过一个,则可以删除第二个re.sub()

gamer_tag = "- 00test Gamertag "
gamer_tag = re.sub(r"[^a-zA-Z0-9\s]", "", gamer_tag)
print(gamer_tag.strip())

# Output: 00test Gamertag

Your regex will substitute the non-alphanumerical characters anywhere in the input string.您的正则表达式将替换输入字符串中任何位置的非字母数字字符。 If you only need to have this happening at the start of the string, then use the start-of-input anchor (ie ^ ):如果您只需要在字符串的开头发生这种情况,请使用输入开始锚点(即^ ):

^\W+

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM