简体   繁体   中英

Get text inside one-pair-brackets but not double square brackets

  1. In the text below I only want to get "4th of July"

Hello, happy [4th of July]. I love the [[firework]]

  1. I have these text:

    text = {{Hey, how are you.}} I've watched John Mulaney all night. [[Category: Comedy]] [[Image: John Mulaney]]

I'm trying to remove {{Hey, how are you.}}, [[Category: Comedy]], and [[Image: John Mulaney]]. This is what I have tried so far, but it doesn't seem to work:

hey_how_are_you = re.compile('\{\{.*\}\}')
category = re.compile('\[\[Category:.*?\]\]')
image = re.compile('\[\[Image:.*?\]\]')
text = hey_how_are_you.sub('', text)
text = category.sub('', text)
text = image.sub('', text)
# 1.
text="Hello, happy [4th of July]. I love the [[firework]]. "
l=re.findall(r"(?<!\[)\[([^\[\]]+)\]",text)
print(l,"\n",l[0])
# 2.
text2=" {{Hey, how are you.}} I've watched John Mulaney all night. [[Category: Comedy]] [[Image: John Mulaney]]"
print(re.sub(r"\{\{.*?\}\}|\[\[\s*Category:.*?\]\]|\[\[\s*Image:.*?\]\]","",text2))

Output:
['4th of July'] 
 4th of July
  I've watched John Mulaney all night.  

In the 1st problem you can use negative lookbehind: (?<!\[)
Your regexp in the 2nd problem works for me. (What error you have?) However, it can be solved in one pass, too.

You are going about that in a very strange way. Try reading more on the regular expressions documentation. Try this instead:

import re

text = "{{Hey, how are you.}} I've watched John Mulaney all night. [[Category: Comedy]] [[Image: John Mulaney]]"

text = re.sub('\{\{.*\}\}', '', text)
text = re.sub('\[\[Category:.*?\]\]', '', text)
text = re.sub('\[\[Image:.*?\]\]', '',text)

text

Out[ ]:
" I've watched John Mulaney all night.  "

Notice there is still a space at the front and 2 more at the end of the output string. I'll leave you to figure out how to do that. Does that help?

PS Look at documentation on how to use RE to exclude all but [4th of July].

@Duy , have a look at the following 2 examples.

I've used the concept of list comprehension & string's split() method.

### Example 1
>>> string = 'Hello, happy [4th of July]. I love the [[firework]]'
>>>
>>> part1 = string.split('[')
>>> part1
['Hello, happy ', '4th of July]. I love the ', '', 'firework]]']
>>>
>>> part2 = [s[:s.index(']')] for s in part1 if ']' in s and not ']]' in s]
>>> part2
['4th of July']
>>>

Example 2

>>> sentence1 = "This is [Great opportunity] to [[learn]] [Nice things] and [Programming] [[One]] of them]."
>>> part1 = sentence1.split('[')
>>> part2 = [s[:s.index(']')] for s in part1 if ']' in s and not ']]' in s]
>>> part2
['Great opportunity', 'Nice things', 'Programming']
>>>

The below code is your 2nd input text.

Example 3

>>> import re
>>>
>>> string2 = "text = {{Hey, how are you.}} I've watched John Mulaney all night. [[Category: Comedy]] [[Image: John Mulaney]]"
>>>
>>> part1 = re.split('[\{\[]', string2)
>>> part1
['text = ', '', "Hey, how are you.}} I've watched John Mulaney all night. ", '', 'Category: Comedy]] ', '', 'Image: John Mulaney]]']
>>> part3 = [ "[["+ s[:s.index(']]') + 2] if ']]' in s else "{{" + s[:s.index('}}') + 2]  for s in part1 if ']]' in s or '}}' in s]
>>>
>>> part3
['{{Hey, how are you.}}', '[[Category: Comedy]]', '[[Image: John Mulaney]]']
>>>

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM