簡體   English   中英

通過在Python中使用正則表達式提取具有開始和結束匹配項的字符串文本的一部分

[英]Extracting portion of the string text with start and end matches by using regular expressions in Python

我試圖通過在Python中使用帶有兩個特定匹配項的正則表達式來僅提取字符串文本的一部分。 具體來說,以下是示例文本:

example = """
    The forward-looking statements are made as of the date of this report,
    and the Company assumes no obligation to update the forward-looking statements 
    or to update the reasons why actual results could differ from those projected 
    in the forward-looking statements. PART 1. ITEM 1. BUSINESS 
    General Farmers & Merchants Bancorp, Inc. (Company) is a bank holding company 
    incorporated under the laws of Ohio in 1985 and elected to become a financial 
    holding company under the Federal Reserve in 2014. Our primary subsidiary, 
    The Farmers & Merchants State Bank (Bank) is a community bank operating 
    in Northwest Ohio since 1897.ITEM 2. PROPERTIES Our principal office is located in Archbold, Ohio.
    The Bank operates from the facilities at 307 North Defiance Street. 
    In addition, the Bank owns the property from 200 to 208 Ditto Street, 
    Archbold, Ohio, which it uses for Bank parking and a community mini-park area.
    """

,我想從起始匹配項“ ITEM 1”開始提取文本的“介於”部分。 並以“ ITEM 2”作為結尾匹配,因此最終結果應如下所示:

final_result = """
    ITEM 1. BUSINESS 
    General Farmers & Merchants Bancorp, Inc. (Company) is a bank holding company 
    incorporated under the laws of Ohio in 1985 and elected to become a financial 
    holding company under the Federal Reserve in 2014. Our primary subsidiary, 
    The Farmers & Merchants State Bank (Bank) is a community bank operating 
    in Northwest Ohio since 1897.
    """

實際上,以上示例文本是大量相似文本的一個特定示例,因此我希望答案大致相同,以便我可以將您的答案適應於其他字符串文本不同的文本條件可能有。 先感謝您!

import re

example = """
The forward-looking statements are made as of the date of this report,
and the Company assumes no obligation to update the forward-looking statements 
or to update the reasons why actual results could differ from those projected 
in the forward-looking statements. PART 1. ITEM 1. BUSINESS 
General Farmers & Merchants Bancorp, Inc. (Company) is a bank holding company 
incorporated under the laws of Ohio in 1985 and elected to become a financial 
holding company under the Federal Reserve in 2014. Our primary subsidiary, 
The Farmers & Merchants State Bank (Bank) is a community bank operating 
in Northwest Ohio since 1897.ITEM 2. PROPERTIES Our principal office is located in Archbold, Ohio.
The Bank operates from the facilities at 307 North Defiance Street. 
In addition, the Bank owns the property from 200 to 208 Ditto Street, 
Archbold, Ohio, which it uses for Bank parking and a community mini-park area.
"""


def get_text_between(text, mark1, mark2):
    regex = '({}.*?){}'.format(mark1, mark2)
    match = re.search(regex, example, re.DOTALL)
    if match:
        return match.group(1)
    return None

if __name__ == '__main__':
    text = get_text_between(example, 'ITEM 1', 'ITEM 2')
    if text:
        print(text)
example = """
    The forward-looking statements are made as of the date of this report,
    and the Company assumes no obligation to update the forward-looking statements 
    or to update the reasons why actual results could differ from those projected 
    in the forward-looking statements. PART 1. ITEM 1. BUSINESS 
    General Farmers & Merchants Bancorp, Inc. (Company) is a bank holding company 
    incorporated under the laws of Ohio in 1985 and elected to become a financial 
    holding company under the Federal Reserve in 2014. Our primary subsidiary, 
    The Farmers & Merchants State Bank (Bank) is a community bank operating 
    in Northwest Ohio since 1897.ITEM 2. PROPERTIES Our principal office is located in Archbold, Ohio.
    The Bank operates from the facilities at 307 North Defiance Street. 
    In addition, the Bank owns the property from 200 to 208 Ditto Street, 
    Archbold, Ohio, which it uses for Bank parking and a community mini-park area.
    """
import re
example2 = " ".join(example.split("\n"))
match = re.search("(ITEM 1.*?)ITEM 2",example2)
if match:
  print(match.group(1))

這應該工作

這樣,您可以緩沖要提取的部分字符串。

import re;
example = """
    The forward-looking statements are made as of the date of this report,
    and the Company assumes no obligation to update the forward-looking statements 
    or to update the reasons why actual results could differ from those projected 
    in the forward-looking statements. PART 1. ITEM 1. BUSINESS 
    General Farmers & Merchants Bancorp, Inc. (Company) is a bank holding company 
    incorporated under the laws of Ohio in 1985 and elected to become a financial 
    holding company under the Federal Reserve in 2014. Our primary subsidiary, 
    The Farmers & Merchants State Bank (Bank) is a community bank operating 
    in Northwest Ohio since 1897.ITEM 2. PROPERTIES Our principal office is located in Archbold, Ohio.
    The Bank operates from the facilities at 307 North Defiance Street. 
    In addition, the Bank owns the property from 200 to 208 Ditto Street, 
    Archbold, Ohio, which it uses for Bank parking and a community mini-park area.
"""
final_result = "";
search = re.search('(ITEM\ 1[\s\S]*)ITEM\ 2', example);
if search:
    final_result = search.group(1);

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM