简体   繁体   English

re.findall()查找包含负项的所有双字母组

[英]re.findall() to find all bigrams containing negative term

I am required to use the re.findall() function to find all bigrams that contain a negative term ("never" or "not") as the first word in the following text: 我需要使用re.findall()函数来查找所有包含负数(“从不”或“非”)的双字作为以下文本中的第一个单词:

He jests at scars that never felt a wound. 他开玩笑说从未感到过伤痕的伤痕。 JULIET appears above at a window But, soft! JULIET出现在窗户上方,但是,柔软! what light through yonder window breaks? 窗外有什么光线打破? It is the east, and Juliet is the sun. 它是东方,朱丽叶是太阳。 Arise, fair sun, and kill the envious moon, Who is already sick and pale with grief, That thou her maid art far more fair than she: Be not her maid, since she is envious; 早已升起,阳光明媚,杀死了嫉妒的月亮,她已经生病并因悲伤而苍白。你的女仆比她还公平。她不是女仆,因为她很羡慕。 Her vestal livery is but sick and green And none but fools do wear it; 她的前卫衣着却生病又苍翠,只有傻子才穿。 cast it off. 把它扔掉。 It is my lady, O, it is my love! 是我的女士,哦,这是我的爱! O, that she knew she were! 哦,她知道她是! She speaks yet she says nothing: what of that? 她说话却什么也没说:那是什么? Her eye discourses; 她的话语; I will answer it. 我会回答。 I am too bold, 'tis not to me she speaks: Two of the fairest stars in all the heaven, Having some business, do entreat her eyes To twinkle in their spheres till they return. 我太胆大了,“她不对我说:在整个天堂里,有两个最美丽的星星,有事可做,吸引了她的眼睛,它们的眼球闪烁直到它们回来。 What if her eyes were there, they in her head? 如果她的眼睛在她头上,那该怎么办? The brightness of her cheek would shame those stars, As daylight doth a lamp; 她的脸颊的光辉会使那些星星蒙羞,就像日光照着灯一样。 her eyes in heaven Would through the airy region stream so bright That birds would sing and think it were not night. 她在天堂的眼睛会穿过那片通风的小溪,如此明亮,以至于鸟类会唱歌并认为那不是夜晚。 See, how she leans her cheek upon her hand! 看,她如何将脸颊靠在手上! O, that I were a glove upon that hand, That I might touch that cheek! 哦,那只手是我的手套,那只我的脸颊可能会碰!

I have no problem trying to find one word, but I am at a loss with finding bigrams. 尝试找到一个单词没有问题,但我对查找二元词感到茫然。

import re
inp = input("please enter an expression: ")
print (re.findall(r'\b(?:never|not)\b', inp))

['never', 'not', 'not', 'not] ['从不','不','不','不]

How do I get 如何得到

['never felt', 'not her', 'not to', 'not right'] [“从未感觉到”,“不是她”,“不愿意”,“不正确”]

If you want to also wrap a word just after not or never you need to extend your regex to this, 如果您还想在notnever结束后换行,则需要扩展正则表达式,

\b(?:never|not)\s+[a-zA-Z]+

Here, \\s+ will match one or more whitespace and [a-zA-Z]+ will match one english word having one or more characters. 在此, \\s+将匹配一个或多个空格, [a-zA-Z]+将匹配一个具有一个或多个字符的英语单词。

Regex Demo 正则表达式演示

Python code demo Python代码演示

import re

s = '''He jests at scars that never felt a wound. JULIET appears above at a window But, soft! what light through yonder window breaks? It is the east, and Juliet is the sun. Arise, fair sun, and kill the envious moon, Who is already sick and pale with grief, That thou her maid art far more fair than she: Be not her maid, since she is envious; Her vestal livery is but sick and green And none but fools do wear it; cast it off. It is my lady, O, it is my love! O, that she knew she were! She speaks yet she says nothing: what of that? Her eye discourses; I will answer it. I am too bold, 'tis not to me she speaks: Two of the fairest stars in all the heaven, Having some business, do entreat her eyes To twinkle in their spheres till they return. What if her eyes were there, they in her head? The brightness of her cheek would shame those stars, As daylight doth a lamp; her eyes in heaven Would through the airy region stream so bright That birds would sing and think it were not night. See, how she leans her cheek upon her hand! O, that I were a glove upon that hand, That I might touch that cheek!'''
print(re.findall(r'\b(?:never|not)\s+[a-zA-Z]+', s))

Prints, 打印,

['never felt', 'not her', 'not to', 'not night']

Edit: As you said, you want to discard matches that are followed by a space and a character, you can use a negative look ahead and extend current regex like this, 编辑:正如你所说,你想要放弃的后面有一个空格,并匹配a字符,你可以提前使用负的外观和扩展当前的正则表达式这样的,

\b(?:never|not)\s+[a-zA-Z]+\b(?! a\b)

Here I have used \\b before the negative lookahead to avoid partial match of word and \\b after a in negative lookahead avoids matching words that are not just a but something more like add or and etc. 在这里,我已经使用\\b的负先行之前,以避免字和部分匹配\\ba负先行避免匹配词不只是a ,但更多的东西一样addand等。

Regex Demo where matches are discard if followed by space and a char 正则表达式演示其中,如果后面的空间,比赛被丢弃a字符

x=input()
m = re.findall(r'\b(?:never|not)\b\s+[\w]+', x)
print(m)
# output
['never felt', 'not her', 'not to', 'not night']

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM