[英]python extract capitalized words using regex
I want to extract the word which is capital and occurs 3 or 4 before word "cell" or "cells" 我想提取大写的单词,该单词在单词“ cell”或“ cells”之前出现3或4
example : 例如:
Briefly, MCF-7 idential cells grown as described above were treated with a range of LTX-diol or iso-LTX-diol.
简而言之,将如上所述生长的MCF-7细胞用一系列的LTX-二醇或异-LTX-二醇处理。
I would like to extract MCF-7 from above example. 我想从上述示例中提取MCF-7。
I tried to use [A-Z0-9-]+cells
, but its returning cells, instead of MCF-7 我尝试使用
[A-Z0-9-]+cells
,但是它返回的单元格而不是MCF-7
This answer assumes that you want to match a word beginning with a capital letter, which in turn is followed by 1 to 4 other words, followed then by cell
or cells
. 该答案假定您要匹配一个以大写字母开头的单词,该单词依次由1至4个其他单词组成,然后由
cell
或cells
。 We can try matching using the following pattern: 我们可以尝试使用以下模式进行匹配:
([A-Z][^ ]*)(?=\s+(?:[^A-Z]\S*\s+){1,4}cells?)
The positive lookahead at the end of the pattern asserts the requirement for 1 to 4 words occurring before cell
or cells
. 在图案的端部的正预测先行断言为之前出现1至4个字的要求
cell
或cells
。
input = "Briefly, MCF-7 idential cells grown as described above were treated with a range of LTX-diol or iso-LTX-diol."
r1 = re.findall(r"([A-Z][^ ]*)(?=\s+(?:[^A-Z]\S*\s+){1,4}cells?)", input)
print(r1)
['MCF-7']
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.