简体   繁体   English

在一对括号中获取文本,但在双方括号中获取文本

[英]Get text inside one-pair-brackets but not double square brackets

  1. In the text below I only want to get "4th of July" 在下面的文本中,我只想获取“ 7月4日”

Hello, happy [4th of July]. I love the [[firework]]

  1. I have these text: 我有这些文字:

    text = {{Hey, how are you.}} I've watched John Mulaney all night. text = {{嘿,你好。}}我整夜都在看约翰·穆拉尼。 [[Category: Comedy]] [[Image: John Mulaney]] [[类别:喜剧]] [[图片:约翰·穆拉尼]]

I'm trying to remove {{Hey, how are you.}}, [[Category: Comedy]], and [[Image: John Mulaney]]. 我正在尝试删除{{Hey,you ..}},[[Category:Comedy]]和[[Image:John Mulaney]]。 This is what I have tried so far, but it doesn't seem to work: 到目前为止,这是我尝试过的方法,但似乎不起作用:

hey_how_are_you = re.compile('\{\{.*\}\}')
category = re.compile('\[\[Category:.*?\]\]')
image = re.compile('\[\[Image:.*?\]\]')
text = hey_how_are_you.sub('', text)
text = category.sub('', text)
text = image.sub('', text)
# 1.
text="Hello, happy [4th of July]. I love the [[firework]]. "
l=re.findall(r"(?<!\[)\[([^\[\]]+)\]",text)
print(l,"\n",l[0])
# 2.
text2=" {{Hey, how are you.}} I've watched John Mulaney all night. [[Category: Comedy]] [[Image: John Mulaney]]"
print(re.sub(r"\{\{.*?\}\}|\[\[\s*Category:.*?\]\]|\[\[\s*Image:.*?\]\]","",text2))

Output:
['4th of July'] 
 4th of July
  I've watched John Mulaney all night.  

In the 1st problem you can use negative lookbehind: (?<!\[)
Your regexp in the 2nd problem works for me. (What error you have?) However, it can be solved in one pass, too.

You are going about that in a very strange way. 您正在以一种非常奇怪的方式进行操作。 Try reading more on the regular expressions documentation. 尝试阅读更多有关正则表达式文档的内容。 Try this instead: 尝试以下方法:

import re

text = "{{Hey, how are you.}} I've watched John Mulaney all night. [[Category: Comedy]] [[Image: John Mulaney]]"

text = re.sub('\{\{.*\}\}', '', text)
text = re.sub('\[\[Category:.*?\]\]', '', text)
text = re.sub('\[\[Image:.*?\]\]', '',text)

text

Out[ ]:
" I've watched John Mulaney all night.  "

Notice there is still a space at the front and 2 more at the end of the output string. 请注意,输出字符串的前面还有一个空格,而后面还有2个空格。 I'll leave you to figure out how to do that. 我会让你想办法。 Does that help? 有帮助吗?

PS Look at documentation on how to use RE to exclude all but [4th of July]. PS查看有关如何使用RE排除7月4日以外的所有内容的文档。

@Duy , have a look at the following 2 examples. @Duy ,请看以下两个示例。

I've used the concept of list comprehension & string's split() method. 我使用了列表理解和字符串的split()方法的概念。

### Example 1
>>> string = 'Hello, happy [4th of July]. I love the [[firework]]'
>>>
>>> part1 = string.split('[')
>>> part1
['Hello, happy ', '4th of July]. I love the ', '', 'firework]]']
>>>
>>> part2 = [s[:s.index(']')] for s in part1 if ']' in s and not ']]' in s]
>>> part2
['4th of July']
>>>

Example 2 例子2

>>> sentence1 = "This is [Great opportunity] to [[learn]] [Nice things] and [Programming] [[One]] of them]."
>>> part1 = sentence1.split('[')
>>> part2 = [s[:s.index(']')] for s in part1 if ']' in s and not ']]' in s]
>>> part2
['Great opportunity', 'Nice things', 'Programming']
>>>

The below code is your 2nd input text. 以下代码是您的第二个输入文本。

Example 3 例子3

>>> import re
>>>
>>> string2 = "text = {{Hey, how are you.}} I've watched John Mulaney all night. [[Category: Comedy]] [[Image: John Mulaney]]"
>>>
>>> part1 = re.split('[\{\[]', string2)
>>> part1
['text = ', '', "Hey, how are you.}} I've watched John Mulaney all night. ", '', 'Category: Comedy]] ', '', 'Image: John Mulaney]]']
>>> part3 = [ "[["+ s[:s.index(']]') + 2] if ']]' in s else "{{" + s[:s.index('}}') + 2]  for s in part1 if ']]' in s or '}}' in s]
>>>
>>> part3
['{{Hey, how are you.}}', '[[Category: Comedy]]', '[[Image: John Mulaney]]']
>>>

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM