python - 從特定文本行讀取文件

Question

我不是在談論特定的行號，因為我正在閱讀具有相同格式但長度不同的多個文件。
說我有這個文本文件：

Something here...  
... ... ...   
Start                      #I want this block of text 
a b c d e f g  
h i j k l m n  
End                        #until this line of the file
something here...  
... ... ...

我希望你知道我的意思。 我正在考慮迭代文件，然后使用正則表達式搜索，找到“開始”和“結束”的行號，然后使用linecache從開始行讀取到結束行。 但是如何獲得行號？ 我可以使用什么功能？

Answer 1

如果您只想在Start和End之間使用文本塊，您可以執行以下簡單操作：

with open('test.txt') as input_data:
    # Skips text before the beginning of the interesting block:
    for line in input_data:
        if line.strip() == 'Start':  # Or whatever test is needed
            break
    # Reads text until the end of the block:
    for line in input_data:  # This keeps reading the file
        if line.strip() == 'End':
            break
        print line  # Line is extracted (or block_of_lines.append(line), etc.)

實際上，您不需要操作行號就可以讀取開始和結束標記之間的數據。

邏輯（“讀取直到......”）在兩個塊中重復，但它非常清晰和有效（其他方法通常涉及檢查某些狀態[在塊/塊內/塊到達之前]，這會導致時間損失）。

Answer 2

這是有用的東西：

data_file = open("test.txt")
block = ""
found = False

for line in data_file:
    if found:
        block += line
        if line.strip() == "End": break
    else:
        if line.strip() == "Start":
            found = True
            block = "Start"

data_file.close()

Answer 3

你可以很容易地使用正則表達式。 你可以根據需要使它更健壯，下面是一個簡單的例子。

>>> import re
>>> START = "some"
>>> END = "Hello"
>>> test = "this is some\nsample text\nthat has the\nwords Hello World\n"
>>> m = re.compile(r'%s.*?%s' % (START,END), re.S)
>>> m.search(test).group(0)
'some\nsample text\nthat has the\nwords Hello'

Answer 4

這應該是一個開始：

started = False
collected_lines = []
with open(path, "r") as fp:
     for i, line in enumerate(fp.readlines()):
         if line.rstrip() == "Start": 
             started = True
             print "started at line", i # counts from zero !
             continue
          if started and line.rstrip()=="End":
             print "end at line", i
             break
          # process line 
          collected_lines.append(line.rstrip())

enumerate生成器使用生成器並枚舉迭代。 例如。

  print list(enumerate("a b c".split()))

版畫

   [ (0, "a"), (1,"b"), (2, "c") ]

更新：

海報要求使用正則表達式來匹配“===”和“======”之類的行：

import re
print re.match("^=+$", "===")     is not None
print re.match("^=+$", "======")  is not None
print re.match("^=+$", "=")       is not None
print re.match("^=+$", "=abc")    is not None
print re.match("^=+$", "abc=")    is not None

python - 從特定文本行讀取文件

問題描述

4 個解決方案

解決方案1
33 已采納 2011-09-26 18:29:28

解決方案2
5 2011-09-26 18:23:48

解決方案3
3 2011-09-26 20:23:02

解決方案4
1 2011-09-26 18:22:51

python - 從特定文本行讀取文件

問題描述

4 個解決方案

解決方案1 33 已采納 2011-09-26 18:29:28

解決方案2 5 2011-09-26 18:23:48

解決方案3 3 2011-09-26 20:23:02

解決方案4 1 2011-09-26 18:22:51

解決方案1
33 已采納 2011-09-26 18:29:28

解決方案2
5 2011-09-26 18:23:48

解決方案3
3 2011-09-26 20:23:02

解決方案4
1 2011-09-26 18:22:51