简体   繁体   中英

Removing words which contains only 'x' as repeating pattern using regex

I have a following text:

text = "upi xxxxxxxxxx manoj jayant xxx xxxxxxx532kvblhii"

I am trying to remove repeating pattern with word only containing x to get the ouput as below:

out = "upi manoj jayant xxxxxxx532kvblhii"

I have used following regex which is giving wrong output

re.sub('[x]', '', text)

out = "upi  manoj jayant 532kvblhii"

Please help me to correct my regex.

Use word boundaries to indicate only words consisting entirely of x :

text = "upi xxxxxxxxxx manoj jayant xxx xxxxxxx532kvblhii"
out = re.sub(r'\s*\bx+\b\s*', ' ', text)
print(out.strip())

This prints:

upi manoj jayant xxxxxxx532kvblhii

The logic here is to insist on matching xxx only as entire words. We also consume all whitespace on either end, and then replace with just a single space, to keep the separation between other surrounding words. There is an edge case in doing this, which is that extra spaces might be left remaining at the start and end of the output, which we then strip off.

You can use a regex and a non-regex approach:

# Without regex:
text = "upi xxxxxxxxxx manoj jayant xxx xxxxxxx532kvblhii"
print( " ".join([x for x in text.split() if x != len(x) * x[0] ]) )
# => upi manoj jayant xxxxxxx532kvblhii

# With regex:
import re
print( re.sub(r'\s*\bx+\b', '', text).lstrip() )
upi manoj jayant xxxxxxx532kvblhii

See the Python demo and the regex demo .

No-regex solution details

  • text.split() is used to split the string with whitespace
  • if x != len(x) * x[0] means that we discard all words that contain the same amount of its first character as there are chars in the word.

Regex details

  • \s* - zero or more whitespaces
  • \b - a word boundary
  • x+ - one or more x chars
  • \b - a word boundary.

Note the .lstrip() is only needed when the xxx word appears at the start of the string and there might be a chance of unwelcome leading whitespace.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM