結合每兩行，同時在python中讀取.txt文件

Question

我目前正在使用Python中的大型文件，例如

junk
junk
junk
--- intermediate:
1489       pi0     111 [686] (1491,1492)   
                             0.534    -0.050    -0.468     0.724     0.135
1499       pi0     111 [690] (1501,1502)   
                            -1.131     0.503    12.751    12.812     0.135
--- final:
 32        e-      11 [7]    
                             9.072    20.492   499.225   499.727     0.001
 33        e+     -11 [6]    
                           -11.317   -17.699  2632.568  2632.652     0.001
 12         s       3 [10] (91)  >43 {+5}
                             2.946     0.315    94.111    94.159     0.500
 14         g      21 [11] (60,61)  34>>16 {+7,-6}
                            -0.728     3.329     5.932     6.907     0.950
------------------------------------------------------------------------------
junk
junk
--- intermediate:
repeat

我想將“ --- final”行之后的每兩行合並，直到“ ----------------”行。 例如，我想讀取一個輸出文件

 32        e-      11 [7]      9.072    20.492   499.225   499.727     0.001
 33        e+     -11 [6]    -11.317   -17.699  2632.568  2632.652     0.001
 12         s       3 [10]     2.946     0.315    94.111    94.159     0.500
 14         g      21 [11]    -0.728     3.329     5.932     6.907     0.950

注意，我如何省略行中沒有空格的多余條目。 我目前的做法是

start = False
for line in myfile:
    line = line.strip()
    fields = line.split()
    if len(fields)==0:
        continue
    if not start:
        if fields[0] == "----final:":
            start = True
        continue

len（fields）== 0應該在“ ---------”行結束腳本，並繼續直到看到另一條“ ---- final”行。 我目前不知道該怎么做，就是將這兩行合並在一起，而忽略行中沒有空格的額外信息。 有什么建議么？

Answer 1

合並其他所有行的快捷方法：

for i in range(0,len(lines),2):

    fields1 = lines[i].strip().split()
    fields2 = lines[i+1].strip().split()
    print("\t".join(fields1[:4]+fields2))

請注意，我在這里考慮了所有要合並的行均被提取並放入稱為lines的列表中，並且我只是對將要保留在每條第一行中的元素數（4）進行了硬編碼。

Answer 2

只要您知道圍繞所需部分的確切線條：

#split the large text into lines
lines = large_text.split('\n')
#get the indexes of the beginning and end of your target section
idx_start = lines.index("--- final:")
idx_finish= lines.index("------------------------------------------------------------------------------")
#iterate through the section in steps of 2, split on spaces, remove empty strings, print them as tab delimited
for idx in range( idx_start+1, idx_finish, 2):
    out = list(filter(None,(lines[idx]+lines[idx+1]).split(" ")))
    print("\t".join(out))

其中large_text是作為巨型字符串導入的文件。

編輯為了打開文件“ large_text.txt”作為字符串，請嘗試以下操作：

with open('large_text.txt','r') as f:
    #split the large text into lines
    lines = f.readlines()
    #get the indexes of the beginning and end of your target section
    idx_start = lines.index("--- final:")
    idx_finish= lines.index("------------------------------------------------------------------------------")
    #iterate through the section in steps of 2, split on spaces, remove empty strings, print them as tab delimited
    for idx in range( idx_start+1, idx_finish, 2):
        out = list(filter(None,(lines[idx]+lines[idx+1]).split(" ")))
        print("\t".join(out))

假設

您知道分隔感興趣部分的行（即：“ --- final：”）
您的值是空格，不能以制表符分隔。 如果沒有，將split(" ")更改為split("\\t")

應該是贏家添加了對一組行的格式修復。 相同的假設成立。

with open('./large_text.txt','r') as f:
    #split the large text into lines
    lines = f.read().split("\n")
    #get the indexes of the beginning and end of your target section
    idx_start = lines.index("--- final:")
    idx_finish= lines.index("------------------------------------------------------------------------------")
    #iterate through the section in steps of 2, split on spaces, remove empty strings, print them as tab delimited
    for idx in range( idx_start+1, idx_finish, 2):
        line_spaces = list(filter(None,lines[idx].split(" ")))[0:4]
        other_line = list(filter(None,(lines[idx+1]).split(" ")))
        out = line_spaces + other_line
        print("\t".join(out))

Answer 3

您可以使用較新的regex模塊和一些正則表達式解決您的問題：

import regex as re

rx = re.compile(r'''(?V1)
        (?:^---\ final:[\n\r])|(?:\G(?!\A))
        ^(\ *\d+.+?)\ *$[\n\r]
        ^\ +(.+)$[\n\r]
        ''', re.MULTILINE | re.VERBOSE)

junky_string = your_string

matches = ["    ".join(match.groups()) 
            for match in rx.finditer(junky_string)
            if match.group(1) is not None]
print(matches)
# [' 32        e-      11 [7]    9.072    20.492   499.225   499.727     0.001', 
#  ' 33        e+     -11 [6]    -11.317   -17.699  2632.568  2632.652     0.001',
#  ' 12         s       3 [10] (91)  >43 {+5}    2.946     0.315    94.111    94.159     0.500', 
#  ' 14         g      21 [11] (60,61)  34>>16 {+7,-6}    -0.728     3.329     5.932     6.907     0.950']

這將查找--- final:在行或空格的開頭，然后在匹配后添加數字--- final: ：（有關更多詳細信息，請研究regex101.com上的說明）。
匹配的項目隨后通過制表符合並。

結合每兩行，同時在python中讀取.txt文件

問題描述

3 個解決方案

解決方案1
0 2016-09-07 20:19:40

解決方案2
0 已采納 2016-09-07 20:24:22

解決方案3
0 2016-09-07 20:37:21

結合每兩行，同時在python中讀取.txt文件

問題描述

3 個解決方案

解決方案1 0 2016-09-07 20:19:40

解決方案2 0 已采納 2016-09-07 20:24:22

解決方案3 0 2016-09-07 20:37:21

解決方案1
0 2016-09-07 20:19:40

解決方案2
0 已采納 2016-09-07 20:24:22

解決方案3
0 2016-09-07 20:37:21