簡體   English   中英

Python-從文本文件中提取字符串,直到前兩個新行空間

[英]Python - Extract string from a text file until the first 2 new line space

我有一個輸入文件,而我必須基於2個空白新行提取幾行。

例如:文本文件如下。

1. Sometext
Sometext 
Sometext

2. Sometext
Sometext
Sometext

3. Sometext
Sometext
Sometext

Sometext which is not needed
Sometext which is not needed
Sometext which is not needed

我必須從“ 1”中提取一個子串。 在“ 2”之前。 以及“ 2”中的第二個子字符串。 在“ 3”之前。 依數字而定。 我有下面的腳本,它獲取輸出,但是它也獲取所有我不需要的“不需要的Sometext”。 請參見下面的代碼:

file_path = open("filename", "r")
content = file_path.read()
size1 = len(content)
start =0
a=1
b=2
end =0
ext =0   

while (start<size):
   if (end !=-1):
   subString = content[content.find(str(a)+".")+0:content.find("\n"+str(b)+".")] 
   print (subString)
   end = content.find(str(b)+".",start)
                print ("\n")
                a = int(a)+1 # increment to find the next start number
                b = int(b)+1 # increment to find the next end number
                start = end+1 # continuing to search the next
            else:
                break

因此,我決定為終點位置找到2個連續的空白行,並使用下面的一行,但這沒有用。

subString = content[content.find (str(a)+".")+3:content.find("\n\n")]

如果您有任何疑問,請幫助並讓我知道。 先感謝您。

我不確定我是否正確理解了您的問題,但這是將輸出的代碼:

['Sometext', 'Sometext', 'Sometext']
['Sometext', 'Sometext', 'Sometext']
['Sometext', 'Sometext', 'Sometext']

根據您問題中的文字。 相反,如果您希望1到2是整個子串,如下所示:

['1. Sometext\nSometext\nSometext']
['2. Sometext\nSometext\nSometext']
['3. Sometext\nSometext\nSometext']

您應該將if語句更改為:

if is_number(i[0]):
            substring = []
            substring.append(i)
            print(substring)

否則你可以使用下面的代碼

def is_number(string):
    try:
        float(string)
        return True
    except ValueError:
        return False

with open('testing.txt', 'r') as f:
content = f.read().split('\n\n')
for i in content:
    if is_number(i[0]):
        c = i.split('\n')
        substring = [line[3:] if is_number(line[0]) else line for line in c]
        print(substring)

您將必須在末尾過濾掉不需要的行,但這將獲得您想要的:

from itertools import groupby
with open("in.txt") as f:
    grps = groupby(f, key=lambda x: bool(x.strip()))
    print([list(v) for k,v in grps if k])

輸出:

[['1. Sometext\n', 'Sometext\n', 'Sometext\n'], ['2. Sometext\n', 'Sometext\n', 'Sometext\n'], ['3. Sometext\n', 'Sometext\n', 'Sometext\n'], ['Sometext which is not needed\n', 'Sometext which is not needed\n', 'Sometext which is not needed']]

由於您要保留的所有部分均以數字開頭:

from itertools import groupby, takewhile

with open("in.txt") as f:
    grps = groupby(f, key=lambda x: bool(x.strip()))
    print (list(takewhile(lambda x: x[0][0].isdigit(),(list(v) for k,v in grps if k))))

輸出:

[['1. Sometext\n', 'Sometext\n', 'Sometext\n'],
 ['2. Sometext\n', 'Sometext\n', 'Sometext\n'],
['3. Sometext\n', 'Sometext\n', 'Sometext\n']]

如果您知道有n群組,則可以切片:

from itertools import groupby, islice
with open("in.txt") as f:
    grps = groupby(f, key=lambda x: bool(x.strip()))
    print (list(islice((list(v) for k,v in grps if k),3)))

輸出:

[['1. Sometext\n', 'Sometext\n', 'Sometext\n'],
 ['2. Sometext\n', 'Sometext\n', 'Sometext\n'], 
['3. Sometext\n', 'Sometext\n', 'Sometext\n']]

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM