简体   繁体   English

两个字符串python之间的正则表达式匹配

[英]Regex matching between two strings python

I am trying to match all ocurrences between two [Term] 's or a [Term] and a [Typedef] in a file cointaining something like this: 我正在尝试匹配文件中两个[Term][Term][Typedef] Typedef [Term]之间的所有匹配项:

remark: Includes Ontology(OntologyID(OntologyIRI(<http://purl.obolibrary.org/obo/go/never_in_taxon.owl>))) [Axioms: 18 Logical Axioms: 0]
ontology: go

[Term]
id: GO:0000001
name: mitochondrion inheritance
namespace: biological_process
def: "The distribution of mitochondria, including the mitochondrial genome, into daughter cells after mitosis or meiosis, mediated by interactions between mitochondria and the cytoskeleton." [GOC:mcc, PMID:10873824, PMID:11389764]
synonym: "mitochondrial inheritance" EXACT []
is_a: GO:0048308 ! organelle inheritance
is_a: GO:0048311 ! mitochondrion distribution

[Term]
id: GO:0000002
name: mitochondrial genome maintenance
namespace: biological_process
def: "The maintenance of the structure and integrity of the mitochondrial genome; includes replication and segregation of the mitochondrial chromosome." [GOC:ai, GOC:vw]
is_a: GO:0007005 ! mitochondrion organization

[Term]
id: GO:0000011
name: vacuole inheritance
namespace: biological_process
def: "The distribution of vacuoles into daughter cells after mitosis or meiosis, mediated by interactions between vacuoles and the cytoskeleton." [GOC:mcc, PMID:10873824, PMID:14616069]
is_a: GO:0007033 ! vacuole organization
is_a: GO:0048308 ! organelle inheritance

[Typedef]
id: positively_regulates
name: positively regulates
namespace: external
xref: RO:0002213
holds_over_chain: negatively_regulates negatively_regulates
is_a: regulates ! regulates
transitive_over: part_of ! part of

[Typedef]
id: regulates
name: regulates
namespace: external
xref: RO:0002211
is_transitive: true
transitive_over: part_of ! part of

With: (?=\\[Term\\]\\s)[\\s\\S]*(?=\\s\\s\\[Term\\]\\s) I'm only matching between the first [Term] and the penultimate. 与: (?=\\[Term\\]\\s)[\\s\\S]*(?=\\s\\s\\[Term\\]\\s)我只匹配第一个[Term]和倒数第二个。

To match between the two, you can try this: 要在两者之间进行匹配,您可以尝试以下操作:

import re
s = "[Term] id: GO:0000001 name: mitochondrion inheritance namespace: biological_process def: "The distribution of mitochondria, including the mitochondrial genome, into daughter cells after mitosis or meiosis, mediated by interactions between mitochondria and the cytoskeleton." [GOC:mcc, PMID:10873824, PMID:11389764] synonym: "mitochondrial inheritance" EXACT [] is_a: GO:0048308 ! organelle inheritance is_a: GO:0048311 ! mitochondrion distribution" #etc
the_data = re.findall("\[Term\](.*?)\n\s\[Term\]|\[Term\](.*?)\n\s\[Typedef\]", s)

Final Output: 最终输出:

[(' id: GO:0000001 name: mitochondrion inheritance namespace: biological_process def: "The distribution of mitochondria, including the mitochondrial genome, into daughter cells after mitosis or meiosis, mediated by interactions between mitochondria and the cytoskeleton." [GOC:mcc, PMID:10873824, PMID:11389764] synonym: "mitochondrial inheritance" EXACT [] is_a: GO:0048308 ! organelle inheritance is_a: GO:0048311 ! mitochondrion distribution', ''), ('', ' id: GO:0000011 name: vacuole inheritance namespace: biological_process def: "The distribution of vacuoles into daughter cells after mitosis or meiosis, mediated by interactions between vacuoles and the cytoskeleton." [GOC:mcc, PMID:10873824, PMID:14616069] is_a: GO:0007033 ! vacuole organization is_a: GO:0048308 ! organelle inheritance')]

You may use 您可以使用

r'(?m)^\[Term].*(?:\r?\n(?!\[(?:Typedef|Term)]).*)*'

See the regex demo 正则表达式演示

Details 细节

  • (?m) - multiline modifier (?m) -多行修饰符
  • ^ - start of a line ^ -一行的开始
  • \\[Term] - a [Term] substring \\[Term] -一个[Term]子字符串
  • .* - rest of the current line .* -当前行的其余部分
  • (?:\\r?\\n(?!\\[(?:Typedef|Term)]).*)* - 0 or more occurrences of: (?:\\r?\\n(?!\\[(?:Typedef|Term)]).*)* -0次或多次出现:
    • \\r?\\n(?!\\[(?:Typedef|Term)]) - a line break (CRLF or LF) not followed with a [Typedef] or [Term] substring \\r?\\n(?!\\[(?:Typedef|Term)]) -换行符(CRLF或LF)后没有[Typedef][Term]子字符串
    • .* - rest of the current line .* -当前行的其余部分

Python code : Python代码

import re
s = """remark: Includes Ontology(OntologyID(OntologyIRI(<http://purl.obolibrary.org/obo/go/never_in_taxon.owl>))) [Axioms: 18 Logical Axioms: 0]
ontology: go

[Term]
id: GO:0000001
name: mitochondrion inheritance
namespace: biological_process
def: "The distribution of mitochondria, including the mitochondrial genome, into daughter cells after mitosis or meiosis, mediated by interactions between mitochondria and the cytoskeleton." [GOC:mcc, PMID:10873824, PMID:11389764]
synonym: "mitochondrial inheritance" EXACT []
is_a: GO:0048308 ! organelle inheritance
is_a: GO:0048311 ! mitochondrion distribution

[Term]
id: GO:0000002
name: mitochondrial genome maintenance
namespace: biological_process
def: "The maintenance of the structure and integrity of the mitochondrial genome; includes replication and segregation of the mitochondrial chromosome." [GOC:ai, GOC:vw]
is_a: GO:0007005 ! mitochondrion organization

[Term]
id: GO:0000011
name: vacuole inheritance
namespace: biological_process
def: "The distribution of vacuoles into daughter cells after mitosis or meiosis, mediated by interactions between vacuoles and the cytoskeleton." [GOC:mcc, PMID:10873824, PMID:14616069]
is_a: GO:0007033 ! vacuole organization
is_a: GO:0048308 ! organelle inheritance

[Typedef]
id: positively_regulates
name: positively regulates
namespace: external
xref: RO:0002213
holds_over_chain: negatively_regulates negatively_regulates
is_a: regulates ! regulates
transitive_over: part_of ! part of

[Typedef]
id: regulates
name: regulates
namespace: external
xref: RO:0002211
is_transitive: true
transitive_over: part_of ! part of"""
rx = r'(?m)^\[Term].*(?:\r?\n(?!\[(?:Typedef|Term)]).*)*'
cnt=0
for m in re.findall(rx, s):
    print(m)
    print('-------------- Next match ---------------')
    cnt = cnt + 1

print("Number of mathes: {}".format(cnt))

Output: 输出:

[Term]
id: GO:0000001
name: mitochondrion inheritance
namespace: biological_process
def: "The distribution of mitochondria, including the mitochondrial genome, into daughter cells after mitosis or meiosis, mediated by interactions between mitochondria and the cytoskeleton." [GOC:mcc, PMID:10873824, PMID:11389764]
synonym: "mitochondrial inheritance" EXACT []
is_a: GO:0048308 ! organelle inheritance
is_a: GO:0048311 ! mitochondrion distribution

-------------- Next match ---------------
[Term]
id: GO:0000002
name: mitochondrial genome maintenance
namespace: biological_process
def: "The maintenance of the structure and integrity of the mitochondrial genome; includes replication and segregation of the mitochondrial chromosome." [GOC:ai, GOC:vw]
is_a: GO:0007005 ! mitochondrion organization

-------------- Next match ---------------
[Term]
id: GO:0000011
name: vacuole inheritance
namespace: biological_process
def: "The distribution of vacuoles into daughter cells after mitosis or meiosis, mediated by interactions between vacuoles and the cytoskeleton." [GOC:mcc, PMID:10873824, PMID:14616069]
is_a: GO:0007033 ! vacuole organization
is_a: GO:0048308 ! organelle inheritance

-------------- Next match ---------------
Number of mathes: 3

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM