匹配首次出现的非字母数字字符的正则表达式

Question

I am parsing some user input to make a basic Discord bot assigning roles and such.我正在解析一些用户输入以制作基本的 Discord 机器人分配角色等。 I am trying to generalize some code to reuse for different similar tasks (doing similar things in different categories/channels).我试图概括一些代码以重复用于不同的类似任务（在不同的类别/渠道中做类似的事情）。

Generally, I am looking for a substring (the category), then taking the string after as that categories value.通常，我正在寻找 substring（类别），然后将后面的字符串作为该类别值。 I am looking line by line for my category, replacing the "category" substring and returning a stripped version.我正在逐行查找我的类别，替换“类别”substring 并返回剥离版本。 However, what I have now also replaces any space in the "value" string.但是，我现在所拥有的也替换了“值”字符串中的任何空格。

Originally the string looks like this:最初字符串看起来像这样：

Gamertag : 00test gamertag

What I want to do, is preserve the spaces in the value.我想要做的是保留值中的空格。 The regex I am trying to do is: match all non alpha-numeric chars until the first letter.我正在尝试做的正则表达式是：匹配所有非字母数字字符，直到第一个字母。

My return is already matching non alpha but can't figure out how to get just first group, looks like it should be simply adding a?我的回报已经匹配非 alpha 但无法弄清楚如何获得第一组，看起来应该简单地添加一个？ to make it a lazy operator but not sure.. example code and string below (regex I want to replace is the final print string).使其成为一个惰性运算符但不确定.. 下面的示例代码和字符串（我要替换的正则表达式是最终的打印字符串）。

String I am working with:我正在使用的字符串：

- 00test Gamertag      #(or any non-alpha delimiter)

Desired Results (by matching and stripping the extra characters)期望的结果（通过匹配和去除多余的字符）

00test Gamertag     #(remove leading space and any non-alpha characters before the first words)

The regex I am trying to do is: match all non alpha-numeric chars until the first letter.我正在尝试做的正则表达式是：匹配所有非字母数字字符，直到第一个字母。 Should be something like the following, which is close to what I use to strip non-alphas now but it does all not the first group - so I want to match the first group of non-alphas in a string to strip that part using re.sub..应该类似于以下内容，这与我现在用来去除非字母的方法很接近，但它不是第一组 - 所以我想匹配字符串中的第一组非字母以使用 re 去除该部分.sub..

\W+?

https://www.online-python.com/gDVhZrnmlq https://www.online-python.com/gDVhZrnmlq

Thank you!谢谢！

Answer 1

It depends on your inputs, you can use two regex to achieve your goal, the first to remove all non alpha-numeric from your string including the ones between words, and the second one to remove whitespaces between words if there is more than one space between each two words:这取决于您的输入，您可以使用两个正则表达式来实现您的目标，第一个从字符串中删除所有非字母数字，包括单词之间的那些，第二个删除单词之间的空格（如果有多个空格）每两个词之间：

import re


gamer_tag = "µ& - 00test          -   Gamertag"
gamer_tag = re.sub(r"[^a-zA-Z0-9\s]", "", gamer_tag)
gamer_tag = re.sub(r" +", " ", gamer_tag)
print(gamer_tag.strip())

# Output: 00test Gamertag

You can remove the second re.sub() if you sure that there will no more than one space between words.如果您确定单词之间的空格不超过一个，则可以删除第二个re.sub() 。

gamer_tag = "- 00test Gamertag "
gamer_tag = re.sub(r"[^a-zA-Z0-9\s]", "", gamer_tag)
print(gamer_tag.strip())

# Output: 00test Gamertag

Answer 2

Your regex will substitute the non-alphanumerical characters anywhere in the input string.您的正则表达式将替换输入字符串中任何位置的非字母数字字符。 If you only need to have this happening at the start of the string, then use the start-of-input anchor (ie ^ ):如果您只需要在字符串的开头发生这种情况，请使用输入开始锚点（即^ ）：

^\W+

匹配首次出现的非字母数字字符的正则表达式

问题描述

2 个解决方案

解决方案1
1 2023-01-10 20:44:11

解决方案2
1 已采纳 2023-01-10 21:54:12

匹配首次出现的非字母数字字符的正则表达式

问题描述

2 个解决方案

解决方案1 1 2023-01-10 20:44:11

解决方案2 1 已采纳 2023-01-10 21:54:12

解决方案1
1 2023-01-10 20:44:11

解决方案2
1 已采纳 2023-01-10 21:54:12