[英]How to find a pattern between two non-alphanumeric character using regex in python
I am very new to python regex. 我对python regex非常陌生。 i am not able to get a clear idea about how to search for " * " in a sentence , since * is used as a keyword in regex i am getting confused.
我不知道如何在句子中搜索“ *”,因为*在正则表达式中用作关键字,我感到困惑。 my question is , i have a file in which :
我的问题是,我有一个文件,其中:
*CHI: <that guy was> [//] that bunny was going to [: gonna] take that
balloon !
%mor: pro:dem|that n|bunny aux|be&PAST&13S part|go-PRESP
part|go-PRESP~inf|to v|take pro:dem|that n|balloon !
So in this i have to retrieve the sentence that is between "*CHI :" and "%mor" 因此,我必须检索介于“ * CHI:”和“%mor”之间的句子
my desired output should be 我想要的输出应该是
<that guy was> [//] that bunny was going to [: gonna] take that
balloon !
You can use the re.DOTALL
flag to make .
您可以使用
re.DOTALL
标志进行设置.
match newlines; 匹配换行符; there's no need for lookarounds:
无需环顾四周:
import re
s = '''*CHI: <that guy was> [//] that bunny was going to [: gonna] take that
balloon !
%mor: pro:dem|that n|bunny aux|be&PAST&13S part|go-PRESP
part|go-PRESP~inf|to v|take pro:dem|that n|balloon !
'''
print(re.search(r'\*CHI: (.+)\n%mor:', s, re.DOTALL)[1])
Output: 输出:
<that guy was> [//] that bunny was going to [: gonna] take that
balloon !
You can put * inside []. 您可以将*放在[]内。 Inside the character class [] the metachars like *
在字符类[]中,元字符如*
are stripped of the special meaning. 被剥夺了特殊含义。
['*']
with re.search 与研究
t = """*CHI: <that guy was> [//] that bunny was going to [: gonna] take that
balloon !
%mor: pro:dem|that n|bunny aux|be&PAST&13S part|go-PRESP
part|go-PRESP~inf|to v|take pro:dem|that n|balloon !
"""
mo = re.search(r'[*]CHI:\s+(.*)\s+%mor:', t, re.S)
mo.group(1)
'<that guy was> [//] that bunny was going to [: gonna] take that\n balloon !'
With re.findall 与re.findall
re.findall(r'[*]CHI:\s+(.*)\s+%', t,re.S)
['<that guy was> [//] that bunny was going to [: gonna] take that\n balloon !']
Escape the '*' character: 转义'*'字符:
re.findall(r'(?<=\*CHI:)[\s\S]*(?=%mor:)', s)
The positive lookbehind '?<=' and positive lookahead '?=' will trim your start and end terms. “?<=”后面的正向和“?=”前面的正向将修剪您的开始和结束条件。 [\\s\\S] will take care of newline characters.
[\\ s \\ S]将处理换行符。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.