簡體   English   中英

使用 python 在兩個字符串之間提取多行文本

[英]Extract multiline text between two strings using python

我有一個文本文件,看起來像下面的虛擬文件

Lorem Ipsum is simply dummy text of the printing and typesetting industry.
Lorem Ipsum has been the industry's standard dummy text ever since the 1500s, 
when an unknown printer took a galley of type and
some random characters and then start of my data
some characters in between
some characters in between
some characters in between
some characters in between
some characters in between
some characters in between
end of my data
scrambled it to make a type specimen book. 
It has survived not only five centuries, but also the leap into electronic typesetting, remaining essentially unchanged. It was popularised
in the 1960s with the release of Letraset 
when an unknown printer took a galley of type and
some random characters and then start of my data
some characters in between
some characters in between
some characters in between
some characters in between
some characters in between
some characters in between
end of my data
sheets containing Lorem Ipsum passages,
and more recently with desktop publishing
when an unknown printer took a galley of type and
some random characters and then start of my data
some characters in between
some characters in between
some characters in between
some characters in between
some characters in between
some characters in between
end of my data
software like Aldus PageMaker including
versions of Lorem Ipsum.

我想在“我的數據開始”到“我的數據結束”之間提取數據並將其保存在列表變量中。 此數據在文本文件中多次出現。 我嘗試了下面的代碼

import re
import sys
s=[]
with open('mytextfile.txt','r') as file:
    mystring = file.read()
    myre = re.compile(r"start of my data(.*?)end of my data", re.DOTALL)
    parts = myre.findall(mystring)
    s.append(parts)

此代碼將所有找到的字符串一次保存在列表的第一個索引上。 但我需要新索引上的每個單獨數據。 我怎樣才能做到這一點?

使用s.append(parts)你 append 整個列表parts作為數組s的單個元素,這就是為什么s最終只有一個元素(這是一個包含 3 個元素的列表)。 相反,如果你想 append parts的 3 個元素分別到s ,你需要s.extend(parts)

\n拆分捕獲組的數據行:

import re
s=[]
mystring = """
paste your string here
"""
myre = re.compile(r"start of my data(.*?)end of my data", re.DOTALL)
parts = myre.findall(mystring)
for part in parts:
    s.extend(part.split("\n"))
print(len(s))

提供的示例數據的結果是 24。

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM