简体   繁体   中英

Very basic question about regex extraction

I need to extract an ID specified in URLs that have this structure:

https://trello.com/c/iGjJLqwr/1-test-project

in the above example I want to extract:

iGjJLqwr

I need to use the regex expression in Zapier that according to the documentation uses Python regex

The following Python regex somehow is in the right direction but it still returns too much:

[^https://trello.com/c/][\w]+

returns 3 matches:

Match 1
Full match  21-29   iGjJLqwr
Match 2
Full match  31-36   -test
Match 3
Full match  36-44   -project

I need to restrict the result to:

iGjJLqwr

The following regex returns an extra forward slash

[^https://trello.com/c/]\w+/

Match 1
Full match  21-30   iGjJLqwr/

Square brackets [ ... ] create a character set that selects one of any of the characters they contain. If a carat is added at the beginning, [^ ... ] , this set is negated. The pattern does not consider the full, continuous string within the brackets.

In other words, [aaabbc] is equivalent to [abc] (and even [cba] ).

If you just want to capture the first path element after https://trello.com/c/ in a group, you can use this pattern:

https://trello\\.com/c/([^/]+).*

Demo: https://regex101.com/r/99FDJS/2

If you want the pattern to only match this substring within the URL, you can use positive lookahead and lookbehind:

(?<=https://trello\\.com/c/).+?(?=/.*)

Demo: https://regex101.com/r/99FDJS/1

This will match the ID without the extra forward slash:

import re

string = 'https://trello.com/c/iGjJLqwr/1-test-project'

match = re.search(r'[^https://trello.com/c/]\w*(?=/)', string)

print(match.group(0))
iGjJLqwr

The (?=/) asserts that the next character is a forward slash.

In your pattern you use a character class which matches only one out of several characters. Starting with a ^ will make it a negated character class which matches any character that is not in the character class.

Since the character class is not followed by a quantifier, this [^https://trello.com/c/] will match a single i or - and then \\w+ will match 1+ times a word character.

That will give you the matches iGjJLqwr , -test and -project

I think you meant to match the id in a capturing group:

^https://trello\.com/c/(\w+)

regex101 demo

About the pattern

  • ^ Assert start of the string
  • https://trello\\.com/c/ Match literally https://trello.com/c/
  • (\\w+) Capture in group 1 matching 1+ times a word character

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM