简体   繁体   中英

Regex to extract the string

I need help with regex to get the following out of the string

dal001.caxxxxx.test.com. ---> caxxxxx.test.com
caxxxx.test.com -----> caxxxx.test.com

So basically in the first example, I don't want dal001 or anything that starts with 3 letters and 3 digits and want the rest of the string if it starts with only ca .

In second example I want the whole string that starts only with ca .

So far I have tried (^[az]{3}[\d]+\.)?(ca.*) but it doesn't work when the string is dal001.mycaxxxx.test.com .

Any help would be appreciated.

You can use

^(?:[a-z]{3}\d{3}\.)?(ca.*)

See the regex demo . To make it case insensitive, compile with re.I ( re.search(rx, s, re.I) , see below).

Details :

  • ^ - start of string
  • (?:[az]{3}\d{3}\.)? - an optional sequence of 3 letters and then 3 digits and a .
  • (ca.*) - Group 1: ca and the rest of the string.

See the Python demo :

import re
rx = r"^(?:[a-z]{3}\d{3}\.)?(ca.*)"
strs = ["dal001.caxxxxx.test.com","caxxxx.test.com"]
for s in strs:
  m = re.search(rx, s)
  if m:
    print( m.group(1) )

Use re.sub like so:

import re
strs = ['dal001.caxxxxx.test.com', 'caxxxx.test.com']

for s in strs:
    s = re.sub(r'^[A-Za-z]{3}\d{3}[.]', '', s)
    print(s)
# caxxxxx.test.com
# caxxxx.test.com

if you are using re :

import re
my_strings = ['dal001.caxxxxx.test.com', 'caxxxxx.test.com']
my_regex = r'^(?:[a-zA-Z]{3}[0-9]{3}\.)?(ca.*)'
compiled_regex = re.compile(r)
for a_string in my_strings:
    if compiled_regex.match(a_string):
        compiled_regex.sub(r'\1', a_string)

my_regex matches a string that starts ( ^ anchors to the start of the string) with [3 letters][3 digits][a.] , but only optionally, and using a non-capturing group (the (?:) will not get a numbered reference to use in sub ). In either case, it must then contain ca followed by anything, and this part is used as the replacement in the call to re.sub . re.compile is used to make it a bit faster, in case you have many strings to match.

Note on re.compile : Some answers don't bother pre-compiling the regex before the loop. They have made a trade: removing a single line of code, at the cost of re-compiling the regex implicitly on every iteration . If you will use a regex in a loop body, you should always compile it first. Doing so can have a major effect on the speed of a program, and there is no added cost even when the number of iterations is small. Here is a comparison of compiled vs. non-compiled versions of the same loop using the same regex for different numbers of loop iterations and number of trials. Judge for yourself.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM