简体   繁体   中英

How to exclude some characters from the text matched group?

I am going to match two cases: 123456-78-9, or 123456789. My goal is to retrieve 123456789 from either case, ie to exclude the '-' from the first case, no need to mention that the second case is quite straightforward.

I have tried to use a regex like r"\b(\d+(?:-)?\d+(?:-)?\d)\b" , but it still gives '123456-78-9' back to me.

what is the right regex I should use? Though I know do it in two steps: 1) get three parts of digits by regex 2) use another line to concat them, but I still prefer a regex so that the code is more elegant.

Thanks for any advices!

You can use r'(\d{6})(-?)(\d{2})\2(\d)'
Then Join groups 1, 3 and 4, or replace using "\\1\\3\\4"

Will only match these two inputs:

123456-78-9, or 123456789

It's up to you to put boundary conditions on it if needed.

https://regex101.com/r/ceB10E/1

You may put the numbers parts in capturing groups and then replace the entire match with just the captured groups.

Try something like:

\b(\d+)-?(\d+)-?(\d)\b

..and replace with:

\1\2\3

Note that the two non-capturing groups you're using are redundant. (?:-)? = -? .

Regex demo .

Python example:

import re

regex = r"\b(\d+)-?(\d+)-?(\d)\b"

test_str = ("123456-78-9\n"
            "123456789")
subst = "\\1\\2\\3"

result = re.sub(regex, subst, test_str, 0, re.MULTILINE)

if result:
    print (result)

Output:

123456789
123456789

Try it online .

The easiest thing to do here would be to first use re.sub to remove all non digit characters from the input. Then, use an equality comparison to check the input:

inp = "123456-78-9"
if re.sub(r'\D', '', inp) == '123456789':
    print("MATCH")

Edit: If I misunderstood your problem, and instead the inputs could be anything, and you just want to match the two formats given, then use an alternation:

\b(?:\d{6}-\d{2}-\d|\d{9})\b

Script:

inp = "123456-78-9"
if re.search(r'\b(?:\d{6}-\d{2}-\d|\d{9})\b', inp):
    print("MATCH")

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM