正则表达式获取包含字母和（数字/某些特殊）的“单词”，但不仅仅是数字

Question

In short: I'd like to match any "word" (contiguous set of characters separated by whitespace) containing 1 letter and at least 1 of (numbers/certain special characters). 简而言之：我想匹配包含1个字母和至少1个（数字/某些特殊字符）的任何“单词”（由空格分隔的连续字符集）。 These "words" can appear anywhere in a sentence. 这些“单词”可以出现在句子的任何地方。

Trying this in python using re So far, as a pattern, I have: 尝试在python使用re到目前为止，作为一种模式，我有：

\\w*[\\d@]\\w*

Which works, for the most part; 哪个在大多数情况下有效; however, I don't want to have "words" that are only numbers/special. 但是，我不想要只有数字/特殊的“单词”。 Ex: 例如：

Should match: 应该匹配：

h1DF346
123FE453
3f3g6hj7j5v3
hasdf@asdf
r3
r@

Should not match: 不应该匹配：

555555
@
hello
onlyletters

Having trouble excluding the first two under "should not match". 排除前两个“不应该匹配”的问题。 Feel like there's something simple I'm missing here. 觉得这里有一些我很想念的东西。 Thanks! 谢谢！

Answer 1

I would use the | 我会用| or operator like this: 或像这样的运算符：

([A-Za-z]+[\d@]+[\w@]*|[\d@]+[A-Za-z]+[\w@]*)

meaning you want: 你想要的意思：

letters followed by numbers@ followed by any combination, 字母后跟数字@后跟任意组合，
or numbers@ followed by letters followed by any combination 或数字@后跟字母后跟任意组合

Check the regex101 demo here 在这里查看regex101演示

consider using non-capturing groups (?:...) instead of (...) if you are working with groups in other parts of your regular expression. 如果您正在使用正则表达式的其他部分中的组，请考虑使用非捕获组(?:...)而不是(...) 。

Answer 2

Use lookahead assertions like this. 使用这样的先行断言。

Regex: (?=.*[a-zA-Z])(?=.*[@#\\d])[a-zA-Z\\d@#]+ 正则表达式： (?=.*[a-zA-Z])(?=.*[@#\\d])[a-zA-Z\\d@#]+

Explanation: 说明：

(?=.*[a-zA-Z]) tests if something or not is followed by one letter. (?=.*[a-zA-Z])测试，如果something or not之后是一个字母。
(?=.*[@#\\d]) tests if something or not is followed by one character from given character class. (?=.*[@#\\d])测试，如果something or not之后是从给定的字符类的一个字符。
[a-zA-Z\\d@#]+ matches one or more characters from given character class. [a-zA-Z\\d@#]+匹配给定字符类中的一个或多个字符。

Regex101 Demo Regex101演示

Answer 3

While you have your answer, you could still improve the velocity of the accepted regex: 虽然你有答案，但你仍然可以提高接受的正则表达式的速度：

(?=\d++[A-Za-z]+[\w@]+|[a-zA-Z]++[\w@]+)[\w@]{2,}

You'll need the newer regex module here: 你需要更新的regex模块：

import regex as re

string = "h1DF346 123FE453 3f3g6hj7j5v3 hasdf@asdf r3 r@ 555555 @ hello onlyletters"
rx = re.compile(r'(?=\d++[A-Za-z]+[\w@]+|[a-zA-Z]++[\w@]+)[\w@]{2,}')
print(rx.findall(string))
# ['h1DF346', '123FE453', '3f3g6hj7j5v3', 'hasdf@asdf', 'r3', 'r@']

Highjacking @Roberto's demo, you'll have a significant reduction in steps needed to find matches (>7000 vs 338, ~20 times). 劫持@Roberto的演示，你将大大减少寻找比赛所需的步骤（> 7000 vs 338，~20次）。

Answer 4

If you merely change the * (match 0 or more) for + (match 1 or more), you can hit everything correctly. 如果您仅为+（匹配1或更多）更改*（匹配0或更多），则可以正确地击中所有内容。

\\w+[\\d@]\\w+ \\ W + [\\ d @] \\ W +

Except for the 5555... Is there any further pattern to the distribution of letters and numbers that you can distinguish? 除了5555 ...你能分辨出的字母和数字的分布是否还有其他模式？ Can you handle it by replacing a \\w by a requirement for at least one letter before or after the [\\d@]? 你可以通过在[\\ d @]之前或之后用至少一个字母的要求替换\\ w来处理它吗？

正则表达式获取包含字母和（数字/某些特殊）的“单词”，但不仅仅是数字

问题描述

4 个解决方案

解决方案1
3 已采纳 2017-05-25 18:15:02

解决方案2
0 2017-05-25 18:19:22

解决方案3
0 2017-05-25 18:33:51

解决方案4
0 2017-05-25 18:40:50

正则表达式获取包含字母和（数字/某些特殊）的“单词”，但不仅仅是数字

问题描述

4 个解决方案

解决方案1 3 已采纳 2017-05-25 18:15:02

解决方案2 0 2017-05-25 18:19:22

解决方案3 0 2017-05-25 18:33:51

解决方案4 0 2017-05-25 18:40:50

解决方案1
3 已采纳 2017-05-25 18:15:02

解决方案2
0 2017-05-25 18:19:22

解决方案3
0 2017-05-25 18:33:51

解决方案4
0 2017-05-25 18:40:50