简体   繁体   English

Python正则表达式,条件搜索

[英]Python regex, conditional searching

I am trying to split this sentence 我正在尝试拆分这句话

"Mr. Smith bought cheapsite.com for 1.5 million dollars, i.e. he paid a lot " \
"for it. Did he mind? Adam Jones Jr. thinks he didn't. In any case, this " \
"isn't true... Well, with a probability of .9 it isn't."

Into list of below. 进入下面的列表。

Mr. Smith bought cheapsite.com for 1.5 million dollars, i.e. he paid a lot for it.
Did he mind?
Adam Jones Jr. thinks he didn't.
In any case, this isn't true...
Well, with a probability of .9 it isn't.

Code: 码:

print re.findall('([A-Z]+[^.].*?[a-z.][.?!] )[^a-z]',text)

Output: 输出:

['Mr. Smith bought cheapsite.com for 1.5 million dollars, i.e. he paid
 a lot for it. ', "Adam Jones Jr. thinks he didn't. "]

K gud, but it missed some, is there a way to tell Python since last [^az] isn't part of my group, pls continue searching from there. K gud,但是错过了一些,有没有办法告诉Python,因为last [^ az]不属于我的小组,请继续从那里搜索。

EDIT: 编辑:

This was achieved through forward look ahead regex as mentioned by @sputnick. 这是通过@sputnick提到的前瞻性正则表达式实现的。

print re.findall('([A-Z]+[^.].*?[a-z.][.?!] )(?=[^a-z])',text)

Output: 输出:

['Mr. Smith bought cheapsite.com for 1.5 million dollars, i.e. he paid
 a lot for it. ', 'Did he mind? ', "Adam Jones Jr. thinks he didn't. "
, "In any case, this isn't true... "]

But we still need the last sentence. 但是我们仍然需要最后一句话。 Any ideas? 有任何想法吗?

Try this : 尝试这个 :

print re.findall('([A-Z]+[^.].*?[a-z.][.?!] )(?=[^a-z])',text)

using positive look-ahead regex technique, check http://www.regular-expressions.info/lookaround.html 使用积极的前瞻正则表达式技术,请检查http://www.regular-expressions.info/lookaround.html

(.+?)(?<=(?<![A-Z][a-z])(?<![a-z]\.[a-z])(?:\.|\?)(?=\s|$))

Try this.See demo.Grab the capture or groups.Use re.findall . 试试这个,看演示,获取捕获或分组,使用re.findall

https://regex101.com/r/gQ3kS4/45 https://regex101.com/r/gQ3kS4/45

Finally 最后

 print re.findall('[A-Z]+[^.].*?[a-z.][.?!] (?=[^a-z])|.*.$',text)

Above works perfect as needed. 以上作品按需完美。 Includes the last sentence. 包括最后一句话。 But I have no idea why |.*.$ worked pls help me understand. 但是我不知道为什么|.*.$帮助我理解。

Output: 输出:

['Mr. Smith bought cheapsite.com for 1.5 million dollars, i.e. he paid
 a lot for it. ', 'Did he mind? ', "Adam Jones Jr. thinks he didn't. "
, "In any case, this isn't true... ", "Well, with a probability of .9 
it isn't."] 

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM