[英]How to read a specific portion of a txt file in Python?
I need to extract a portion of text from a txt file. 我需要从txt文件中提取一部分文本。
The file looks like this: 该文件如下所示:
STARTINGWORKIN DD / MM / YYYY HH: MM: SS 起步日期DD / MM / YYYY HH:MM:SS
... text lines ... 文字行
... more text lines ... ...更多文字行...
STARTINGWORKING DD / MM / YYYY HH: MM: SS 开始工作DD / MM / YYYY HH:MM:SS
... text lines I want ... 我想要的文字行
... more text lines that I want ... ...更多我想要的文字行...
I tried use 3 for loops (one to start, another read the between line, and the last to end) 我尝试使用3 for循环(一个开始,另一个读取行之间,最后一个结束)
file = "records.txt"
if file.endswith (".txt"):
if os.path.exists (file):
lines = [line.rstrip ('\ n') for line in open (file)]
for line in lines:
#extract the portion
Try this: 尝试这个:
file = "records.txt"
extracted_text = ""
if file.endswith (".txt"):
if os.path.exists (file):
lines = open(file).read().split("STARTINGWORKING")
extracted_text = lines[-1] #Here it is
You can use file_read_backwards
module to read file from end to beginning. 您可以使用
file_read_backwards
模块file_read_backwards
读取文件。 It helps you save time if the file size is big: 如果文件很大,可以帮助您节省时间:
from file_read_backwards import FileReadBackwards
with FileReadBackwards("records.txt") as file:
portion = list()
for line in file:
if not line.startswith('STARTINGWORKING'):
portion.append(line)
else:
break
portion.reverse()
portion
contains lines desired. portion
包含所需的行。
I would take the regex
path to tackle this: 我将使用
regex
解决此问题:
>>> import re
>>> input_data = open('path/file').read()
>>> result = re.search(r'.*STARTINGWORKING\s*(.*)$', input_data, re.DOTALL)
>>> print(result.group(1))
#'DD / MM / YYYY HH: MM: SS\n... text lines I want ...\n... more text lines that I want ...'
The get_final_lines
generator tries to avoid malloc
ing more storage than necessary, while reading a potentially large file. 所述
get_final_lines
发生器试图避免malloc
荷兰国际集团多个存储比必要的,而读一个潜在的大文件。
def get_final_lines(fin):
buf = []
for line in fin:
if line.startswith('STARTINGWORK'):
buf = []
else:
buf.append(line)
yield from buf
if __name__ == '__main__':
with open('some_file.txt') as fin:
for line in get_final_lines(fin):
print(line.rstrip())
You can have a variable that saves all the lines you have read since the last STARTINGWORK
. 您可以使用一个变量来保存自上次
STARTINGWORK
以来已阅读的所有行。
When you finish processing the file you have just what you need. 处理完文件后,您便拥有了所需的文件。
Certainly you do not need to read all the lines to a list first. 当然,您不需要首先阅读列表中的所有行。 You can read it directly in the open file and that returns one line at a time.
您可以直接在打开的文件中读取它,并且一次返回一行。 ie:
即:
result = []
with open(file) as f:
for line in f:
if line.startswith("STARTINGWORK"):
result = [] # Delete what would have accumulated
result.append(line) # Add the last line read
print("".join(result))
In the result
you have everything after the last STARTINGWORK, inclusive you can keep the result [1:]
if you want to delete the initial STARTINGWORK
result
您拥有上一个STARTINGWORK之后的所有内容(包括首尾result [1:]
如果您要删除初始的STARTINGWORK
则可以保留result [1:]
- Then in the code: -然后在代码中:
#list
result = []
#function
def appendlines(line, result, word):
if linea.startswith(word):
del result[:]
result.append(line)
return line, result
with open(file, "r") as lines:
for line in lines:
appendlines(line, result, "STARTINGWORK")
new_result = [line.rstrip("\n") for line in result[1:]]
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.