[英]How to read specific characters from lines in a text file using python?
[英]python - Read file from and to specific lines of text
我不是在談論特定的行號,因為我正在閱讀具有相同格式但長度不同的多個文件。
說我有這個文本文件:
Something here...
... ... ...
Start #I want this block of text
a b c d e f g
h i j k l m n
End #until this line of the file
something here...
... ... ...
我希望你知道我的意思。 我正在考慮迭代文件,然后使用正則表達式搜索,找到“開始”和“結束”的行號,然后使用linecache從開始行讀取到結束行。 但是如何獲得行號? 我可以使用什么功能?
如果您只想在Start
和End
之間使用文本塊,您可以執行以下簡單操作:
with open('test.txt') as input_data:
# Skips text before the beginning of the interesting block:
for line in input_data:
if line.strip() == 'Start': # Or whatever test is needed
break
# Reads text until the end of the block:
for line in input_data: # This keeps reading the file
if line.strip() == 'End':
break
print line # Line is extracted (or block_of_lines.append(line), etc.)
實際上,您不需要操作行號就可以讀取開始和結束標記之間的數據。
邏輯(“讀取直到......”)在兩個塊中重復,但它非常清晰和有效(其他方法通常涉及檢查某些狀態[在塊/塊內/塊到達之前],這會導致時間損失)。
這是有用的東西:
data_file = open("test.txt")
block = ""
found = False
for line in data_file:
if found:
block += line
if line.strip() == "End": break
else:
if line.strip() == "Start":
found = True
block = "Start"
data_file.close()
你可以很容易地使用正則表達式。 你可以根據需要使它更健壯,下面是一個簡單的例子。
>>> import re
>>> START = "some"
>>> END = "Hello"
>>> test = "this is some\nsample text\nthat has the\nwords Hello World\n"
>>> m = re.compile(r'%s.*?%s' % (START,END), re.S)
>>> m.search(test).group(0)
'some\nsample text\nthat has the\nwords Hello'
這應該是一個開始:
started = False
collected_lines = []
with open(path, "r") as fp:
for i, line in enumerate(fp.readlines()):
if line.rstrip() == "Start":
started = True
print "started at line", i # counts from zero !
continue
if started and line.rstrip()=="End":
print "end at line", i
break
# process line
collected_lines.append(line.rstrip())
enumerate
生成器使用生成器並枚舉迭代。 例如。
print list(enumerate("a b c".split()))
版畫
[ (0, "a"), (1,"b"), (2, "c") ]
更新 :
海報要求使用正則表達式來匹配“===”和“======”之類的行:
import re
print re.match("^=+$", "===") is not None
print re.match("^=+$", "======") is not None
print re.match("^=+$", "=") is not None
print re.match("^=+$", "=abc") is not None
print re.match("^=+$", "abc=") is not None
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.