regex to get “words” containing letters and (numbers/certain special), but not only numbers

Question

In short: I'd like to match any "word" (contiguous set of characters separated by whitespace) containing 1 letter and at least 1 of (numbers/certain special characters). These "words" can appear anywhere in a sentence.

Trying this in python using re So far, as a pattern, I have:

\\w*[\\d@]\\w*

Which works, for the most part; however, I don't want to have "words" that are only numbers/special. Ex:

Should match:

h1DF346
123FE453
3f3g6hj7j5v3
hasdf@asdf
r3
r@

Should not match:

555555
@
hello
onlyletters

Having trouble excluding the first two under "should not match". Feel like there's something simple I'm missing here. Thanks!

Answer 1

I would use the | or operator like this:

([A-Za-z]+[\d@]+[\w@]*|[\d@]+[A-Za-z]+[\w@]*)

meaning you want:

letters followed by numbers@ followed by any combination,
or numbers@ followed by letters followed by any combination

Check the regex101 demo here

consider using non-capturing groups (?:...) instead of (...) if you are working with groups in other parts of your regular expression.

Answer 2

Use lookahead assertions like this.

Regex: (?=.*[a-zA-Z])(?=.*[@#\\d])[a-zA-Z\\d@#]+

Explanation:

(?=.*[a-zA-Z]) tests if something or not is followed by one letter.
(?=.*[@#\\d]) tests if something or not is followed by one character from given character class.
[a-zA-Z\\d@#]+ matches one or more characters from given character class.

Regex101 Demo

Answer 3

While you have your answer, you could still improve the velocity of the accepted regex:

(?=\d++[A-Za-z]+[\w@]+|[a-zA-Z]++[\w@]+)[\w@]{2,}

You'll need the newer regex module here:

import regex as re

string = "h1DF346 123FE453 3f3g6hj7j5v3 hasdf@asdf r3 r@ 555555 @ hello onlyletters"
rx = re.compile(r'(?=\d++[A-Za-z]+[\w@]+|[a-zA-Z]++[\w@]+)[\w@]{2,}')
print(rx.findall(string))
# ['h1DF346', '123FE453', '3f3g6hj7j5v3', 'hasdf@asdf', 'r3', 'r@']

Highjacking @Roberto's demo, you'll have a significant reduction in steps needed to find matches (>7000 vs 338, ~20 times).

Answer 4

If you merely change the * (match 0 or more) for + (match 1 or more), you can hit everything correctly.

\\w+[\\d@]\\w+

Except for the 5555... Is there any further pattern to the distribution of letters and numbers that you can distinguish? Can you handle it by replacing a \\w by a requirement for at least one letter before or after the [\\d@]?

regex to get “words” containing letters and (numbers/certain special), but not only numbers

Question

4 answers

solution1
3 ACCPTED 2017-05-25 18:15:02

solution2
0 2017-05-25 18:19:22

solution3
0 2017-05-25 18:33:51

solution4
0 2017-05-25 18:40:50

regex to get “words” containing letters and (numbers/certain special), but not only numbers

Question

4 answers

solution1 3 ACCPTED 2017-05-25 18:15:02

solution2 0 2017-05-25 18:19:22

solution3 0 2017-05-25 18:33:51

solution4 0 2017-05-25 18:40:50

solution1
3 ACCPTED 2017-05-25 18:15:02

solution2
0 2017-05-25 18:19:22

solution3
0 2017-05-25 18:33:51

solution4
0 2017-05-25 18:40:50