[英]How can I rearrange lines in a long text file?
我有一个包含以下信息的大文件:
15
dist_0.185_0.245_0.320/metad_3_t3745_0.198_0.260_0.326.xyz
C -1.79467 0.35800 -0.28800
H -1.21467 0.50800 -1.26800
H -2.37467 -0.52200 -0.38800
S -0.71467 0.08800 1.10200
C 1.04533 2.63800 1.08200
H 2.10533 2.84800 0.96200
H 0.47533 3.26800 0.42200
S 1.07533 0.78800 0.63200
C 0.60533 -2.93200 -0.87800
H 1.26533 -3.82200 -0.90800
H -0.02467 -2.96200 0.00200
S 1.50533 -1.33200 -0.80800
H -2.44467 1.20800 -0.08800
H 0.64533 2.91800 2.09200
H -0.15467 -3.05200 -1.66800
15
dist_0.185_0.245_0.335/metad_3_t3664_0.196_0.259_0.342.xyz
C -2.03000 0.44267 0.23400
H -1.36000 0.19267 -0.59600
H -2.63000 -0.37733 0.38400
S -0.84000 0.41267 1.56400
C 1.17000 2.62267 1.11400
H 2.24000 2.79267 1.01400
H 0.70000 3.24267 0.48400
S 0.86000 0.80267 0.66400
C 0.75000 -2.97733 -1.48600
H 1.68000 -3.32733 -1.91600
H 0.48000 -3.59733 -0.64600
S 0.82000 -1.21733 -0.94600
H -2.66000 1.33267 0.21400
H 0.86000 2.93267 2.13400
H
...
总长度约为 140000 行。 这里的原子排列为 C,H,H,S,C,H,H,S,C,H,H,S,H,H,H 但我想按以下方式排列它们:C,S ,H,H,H,C,S,H,H,H,C,S,H,H,H。 如何使用 Python 或 Shell 排列整个文件?
您可以使用正则表达式在文件中找到此模式:
import re
pattern = r"(C|H|S)\s+-?\d.\d+\s+-?\d.\d+\s+-?\d.\d+"
with open(yourfile) as f:
data = re.findall(pattern, line)
for i in range(len(data) // 15)):
chunk = data[i:(i+15)]
reorder(chunk)
write_to_file(chunk)
def reorder(chunk):
# reorder to your liking by the first letter of each element in the list
def write_to_file(chunk):
#write to a new file in the same format
我跳过了reorder
和write_to_file
函数的实现,因为它们应该不难实现
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.