[英]Get words from first line until first space and without first character in python
[英]Python - Extract string from a text file until the first 2 new line space
我有一個輸入文件,而我必須基於2個空白新行提取幾行。
例如:文本文件如下。
1. Sometext
Sometext
Sometext
2. Sometext
Sometext
Sometext
3. Sometext
Sometext
Sometext
Sometext which is not needed
Sometext which is not needed
Sometext which is not needed
我必須從“ 1”中提取一個子串。 在“ 2”之前。 以及“ 2”中的第二個子字符串。 在“ 3”之前。 依數字而定。 我有下面的腳本,它獲取輸出,但是它也獲取所有我不需要的“不需要的Sometext”。 請參見下面的代碼:
file_path = open("filename", "r")
content = file_path.read()
size1 = len(content)
start =0
a=1
b=2
end =0
ext =0
while (start<size):
if (end !=-1):
subString = content[content.find(str(a)+".")+0:content.find("\n"+str(b)+".")]
print (subString)
end = content.find(str(b)+".",start)
print ("\n")
a = int(a)+1 # increment to find the next start number
b = int(b)+1 # increment to find the next end number
start = end+1 # continuing to search the next
else:
break
因此,我決定為終點位置找到2個連續的空白行,並使用下面的一行,但這沒有用。
subString = content[content.find (str(a)+".")+3:content.find("\n\n")]
如果您有任何疑問,請幫助並讓我知道。 先感謝您。
我不確定我是否正確理解了您的問題,但這是將輸出的代碼:
['Sometext', 'Sometext', 'Sometext']
['Sometext', 'Sometext', 'Sometext']
['Sometext', 'Sometext', 'Sometext']
根據您問題中的文字。 相反,如果您希望1到2是整個子串,如下所示:
['1. Sometext\nSometext\nSometext']
['2. Sometext\nSometext\nSometext']
['3. Sometext\nSometext\nSometext']
您應該將if語句更改為:
if is_number(i[0]):
substring = []
substring.append(i)
print(substring)
否則你可以使用下面的代碼
def is_number(string):
try:
float(string)
return True
except ValueError:
return False
with open('testing.txt', 'r') as f:
content = f.read().split('\n\n')
for i in content:
if is_number(i[0]):
c = i.split('\n')
substring = [line[3:] if is_number(line[0]) else line for line in c]
print(substring)
您將必須在末尾過濾掉不需要的行,但這將獲得您想要的:
from itertools import groupby
with open("in.txt") as f:
grps = groupby(f, key=lambda x: bool(x.strip()))
print([list(v) for k,v in grps if k])
輸出:
[['1. Sometext\n', 'Sometext\n', 'Sometext\n'], ['2. Sometext\n', 'Sometext\n', 'Sometext\n'], ['3. Sometext\n', 'Sometext\n', 'Sometext\n'], ['Sometext which is not needed\n', 'Sometext which is not needed\n', 'Sometext which is not needed']]
由於您要保留的所有部分均以數字開頭:
from itertools import groupby, takewhile
with open("in.txt") as f:
grps = groupby(f, key=lambda x: bool(x.strip()))
print (list(takewhile(lambda x: x[0][0].isdigit(),(list(v) for k,v in grps if k))))
輸出:
[['1. Sometext\n', 'Sometext\n', 'Sometext\n'],
['2. Sometext\n', 'Sometext\n', 'Sometext\n'],
['3. Sometext\n', 'Sometext\n', 'Sometext\n']]
如果您知道有n
群組,則可以切片:
from itertools import groupby, islice
with open("in.txt") as f:
grps = groupby(f, key=lambda x: bool(x.strip()))
print (list(islice((list(v) for k,v in grps if k),3)))
輸出:
[['1. Sometext\n', 'Sometext\n', 'Sometext\n'],
['2. Sometext\n', 'Sometext\n', 'Sometext\n'],
['3. Sometext\n', 'Sometext\n', 'Sometext\n']]
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.