简体   繁体   English

python使用正则表达式提取大写单词

[英]python extract capitalized words using regex

I want to extract the word which is capital and occurs 3 or 4 before word "cell" or "cells" 我想提取大写的单词,该单词在单词“ cell”或“ cells”之前出现3或4

example : 例如:

Briefly, MCF-7 idential cells grown as described above were treated with a range of LTX-diol or iso-LTX-diol. 简而言之,将如上所述生长的MCF-7细胞用一系列的LTX-二醇或异-LTX-二醇处理。

I would like to extract MCF-7 from above example. 我想从上述示例中提取MCF-7。

I tried to use [A-Z0-9-]+cells , but its returning cells, instead of MCF-7 我尝试使用[A-Z0-9-]+cells ,但是它返回的单元格而不是MCF-7

This answer assumes that you want to match a word beginning with a capital letter, which in turn is followed by 1 to 4 other words, followed then by cell or cells . 该答案假定您要匹配一个以大写字母开头的单词,该单词依次由1至4个其他单词组成,然后由cellcells We can try matching using the following pattern: 我们可以尝试使用以下模式进行匹配:

([A-Z][^ ]*)(?=\s+(?:[^A-Z]\S*\s+){1,4}cells?)

The positive lookahead at the end of the pattern asserts the requirement for 1 to 4 words occurring before cell or cells . 在图案的端部的正预测先行断言为之前出现1至4个字的要求cellcells

input = "Briefly, MCF-7 idential cells grown as described above were treated with a range of LTX-diol or iso-LTX-diol."

r1 = re.findall(r"([A-Z][^ ]*)(?=\s+(?:[^A-Z]\S*\s+){1,4}cells?)", input)
print(r1)

['MCF-7']

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM