简体   繁体   中英

to find the pattern using regex?

curP = "https://programmers.co.kr/learn/courses/4673'>#!Muzi#Muzi!)jayg07con&&"

I want to find the Muzi from this string with regex
for example

MuziMuzi : count 0 because it considers as one word
Muzi&Muzi: count 2 because it has & between so it separate the word
7Muzi7Muzi : count 2

I try to use the regex to find all matched

curP = "<a href='https://programmers.co.kr/learn/courses/4673'></a>#!Muzi#Muzi!)jayg07con&&"

pattern = re.compile('[^a-zA-Z]muzi[^a-zA-Z]')
print(pattern.findall(curP))

I expected the ['!muzi#','#Muzi!'] but the result is

['!muzi#']

You need to use this as your regex:

pattern = re.compile('[^a-zA-Z]muzi(?=[^a-zA-Z])', flags=re.IGNORECASE)

(?=[^a-zA-Z]) says that muzi must have a looahead of [^a-zA-Z] but does not consume any characters. So the first match is only matching !Muzi leaving the following # available to start the next match.

Your original regex was consuming !Muzi# leaving Muzi! , which would not match the regex.

Your matches will now be:

['!Muzi', '#Muzi']

As I understand it you want to get any value that may appear on both sides of your keyword Muzi .

That means that the # , in this case, has to be shared by both output values. The only way to do it using regex is to manipulate the string as you find patterns.

Here is my solution:

import re

# Define the function to find the pattern
def find_pattern(curP):
  pattern = re.compile('([^a-zA-Z]muzi[^a-zA-Z])', flags=re.IGNORECASE)
  return pattern.findall(curP)[0]


curP = "<a href='https://programmers.co.kr/learn/courses/4673'></a>#!Muzi#Muzi!)jayg07con&&"
pattern_array = []

# Find the the first appearence of pattern on the string
pattern_array.append(find_pattern(curP))
# Remove the pattern found from the string
curP = curP.replace('Muzi','',1)
#Find the the second appearence of pattern on the string
pattern_array.append(find_pattern(curP))

print(pattern_array)

Output:

['!Muzi#', '#Muzi!']

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM