![](/img/trans.png)
[英]How to combine every two adjoining lines in Chinese txt file into one line with Python
[英]Combine every two lines while reading .txt file in python
我目前正在使用Python中的大型文件,例如
junk
junk
junk
--- intermediate:
1489 pi0 111 [686] (1491,1492)
0.534 -0.050 -0.468 0.724 0.135
1499 pi0 111 [690] (1501,1502)
-1.131 0.503 12.751 12.812 0.135
--- final:
32 e- 11 [7]
9.072 20.492 499.225 499.727 0.001
33 e+ -11 [6]
-11.317 -17.699 2632.568 2632.652 0.001
12 s 3 [10] (91) >43 {+5}
2.946 0.315 94.111 94.159 0.500
14 g 21 [11] (60,61) 34>>16 {+7,-6}
-0.728 3.329 5.932 6.907 0.950
------------------------------------------------------------------------------
junk
junk
--- intermediate:
repeat
我想將“ --- final”行之后的每兩行合並,直到“ ----------------”行。 例如,我想讀取一個輸出文件
32 e- 11 [7] 9.072 20.492 499.225 499.727 0.001
33 e+ -11 [6] -11.317 -17.699 2632.568 2632.652 0.001
12 s 3 [10] 2.946 0.315 94.111 94.159 0.500
14 g 21 [11] -0.728 3.329 5.932 6.907 0.950
注意,我如何省略行中沒有空格的多余條目。 我目前的做法是
start = False
for line in myfile:
line = line.strip()
fields = line.split()
if len(fields)==0:
continue
if not start:
if fields[0] == "----final:":
start = True
continue
len(fields)== 0應該在“ ---------”行結束腳本,並繼續直到看到另一條“ ---- final”行。 我目前不知道該怎么做,就是將這兩行合並在一起,而忽略行中沒有空格的額外信息。 有什么建議么?
合並其他所有行的快捷方法:
for i in range(0,len(lines),2):
fields1 = lines[i].strip().split()
fields2 = lines[i+1].strip().split()
print("\t".join(fields1[:4]+fields2))
請注意,我在這里考慮了所有要合並的行均被提取並放入稱為lines
的列表中,並且我只是對將要保留在每條第一行中的元素數(4)進行了硬編碼。
只要您知道圍繞所需部分的確切線條:
#split the large text into lines
lines = large_text.split('\n')
#get the indexes of the beginning and end of your target section
idx_start = lines.index("--- final:")
idx_finish= lines.index("------------------------------------------------------------------------------")
#iterate through the section in steps of 2, split on spaces, remove empty strings, print them as tab delimited
for idx in range( idx_start+1, idx_finish, 2):
out = list(filter(None,(lines[idx]+lines[idx+1]).split(" ")))
print("\t".join(out))
其中large_text
是作為巨型字符串導入的文件。
編輯為了打開文件“ large_text.txt”作為字符串,請嘗試以下操作:
with open('large_text.txt','r') as f:
#split the large text into lines
lines = f.readlines()
#get the indexes of the beginning and end of your target section
idx_start = lines.index("--- final:")
idx_finish= lines.index("------------------------------------------------------------------------------")
#iterate through the section in steps of 2, split on spaces, remove empty strings, print them as tab delimited
for idx in range( idx_start+1, idx_finish, 2):
out = list(filter(None,(lines[idx]+lines[idx+1]).split(" ")))
print("\t".join(out))
假設
split(" ")
更改為split("\\t")
應該是贏家添加了對一組行的格式修復。 相同的假設成立。
with open('./large_text.txt','r') as f:
#split the large text into lines
lines = f.read().split("\n")
#get the indexes of the beginning and end of your target section
idx_start = lines.index("--- final:")
idx_finish= lines.index("------------------------------------------------------------------------------")
#iterate through the section in steps of 2, split on spaces, remove empty strings, print them as tab delimited
for idx in range( idx_start+1, idx_finish, 2):
line_spaces = list(filter(None,lines[idx].split(" ")))[0:4]
other_line = list(filter(None,(lines[idx+1]).split(" ")))
out = line_spaces + other_line
print("\t".join(out))
您可以使用較新的regex
模塊和一些正則表達式解決您的問題:
import regex as re
rx = re.compile(r'''(?V1)
(?:^---\ final:[\n\r])|(?:\G(?!\A))
^(\ *\d+.+?)\ *$[\n\r]
^\ +(.+)$[\n\r]
''', re.MULTILINE | re.VERBOSE)
junky_string = your_string
matches = [" ".join(match.groups())
for match in rx.finditer(junky_string)
if match.group(1) is not None]
print(matches)
# [' 32 e- 11 [7] 9.072 20.492 499.225 499.727 0.001',
# ' 33 e+ -11 [6] -11.317 -17.699 2632.568 2632.652 0.001',
# ' 12 s 3 [10] (91) >43 {+5} 2.946 0.315 94.111 94.159 0.500',
# ' 14 g 21 [11] (60,61) 34>>16 {+7,-6} -0.728 3.329 5.932 6.907 0.950']
這將查找--- final:
在行或空格的開頭,然后在匹配后添加數字--- final:
:(有關更多詳細信息,請研究regex101.com上的說明)。
匹配的項目隨后通過制表符合並。
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.