繁体   English   中英

如何使用python提取两个字符串之间的列(几乎相同)

[英]How to Extract columns (almost the same) between two strings , using python

我有一个非常大的文本文件,其中包含1339018行,我想提取三个部分:

我的FILE.txt

.
.
.
-----------------------
first ATOMIC CHARGES
-----------------------
   0 C :   -0.157853
   1 C :   -0.156875
   2 C :   -0.143714
   3 C :   -0.140489
   4 S :    0.058926
   5 H :    0.128758
   6 H :    0.128814
   7 H :    0.142420
   8 H :    0.140013
My charges :   -0.0000000

------------------------
.
..
.
-----------------------
first ATOMIC CHARGES AND SPIN
-----------------------
   0 C :   -0.208137    0.054313
   1 C :   -0.206691    0.053890
   2 C :   -0.266791    0.395830
   3 C :   -0.262729    0.395691
   4 S :   -0.184730    0.179002
   5 H :    0.023341   -0.009535
   6 H :    0.023405   -0.009489
   7 H :    0.042728   -0.029862
   8 H :    0.039605   -0.029841
My charges :   -1.0000000

------------------------
.
.
.
.
-----------------------
first ATOMIC CHARGES AND SPIN
-----------------------
   0 C :   -0.086045    0.075562
   1 C :   -0.085256    0.075871
   2 C :    0.022683    0.483590
   3 C :    0.025286    0.483583
   4 S :    0.246328   -0.079498
   5 H :    0.215005   -0.003936
   6 H :    0.215043   -0.003948
   7 H :    0.224379   -0.015598
   8 H :    0.222578   -0.015627
My charges :    1.0000000

------------------------
.
.
.

我尝试使用以下脚本,以提取第四列并将其转换为列表(例如:

oX = [-0.157853,-0.156875,-0.143714 ...]

oY = [-0.208137,-0.206691,...]

oZ = [-0.086045,-0.085256,...]

但不幸的是,第三个循环不起作用。

with open('FILE.txt', 'rb') as f:
     textfile_temp = f.read()
     print textfile_temp.split('first ATOMIC CHARGES')[1].split("My charges :   -0.0000000")[0]
     print textfile_temp.split('first ATOMIC CHARGES AND SPIN')[1].split("My charges :   -1.0000000")[0]
     print textfile_temp.split('first ATOMIC CHARGES AND SPIN')[1].split("My charges :    1.0000000")[0]

可能吗??

尝试在最后一行进行一个细微的更改,如下所示。 你很亲密!

with open('FILE.txt', 'rb') as f:
     textfile_temp = f.read()
     print textfile_temp.split('first ATOMIC CHARGES')[1].split("My charges :   -0.0000000")[0]
     print textfile_temp.split('first ATOMIC CHARGES AND SPIN')[1].split("My charges :   -1.0000000")[0]
     print textfile_temp.split('first ATOMIC CHARGES AND SPIN')[2].split("My charges :    1.0000000")[0]
     #                                                          ^ change this

您可以使用正则表达式提取所需的值:

import re

data = []
block = []

with open('input.txt') as f_input:
    for row in f_input:
        values = re.findall('\s+\d+.*?(-?\d+\.\d+)', row)

        if len(values):
            block.append(float(values[0]))
        elif row.startswith('first ATOMIC') and len(block):
            data.append(block)
            block = []

if len(block):
    data.append(block)            

oX, oY, oZ = data    

print oX
print oY
print oZ

这将打印:

[-0.157853, -0.156875, -0.143714, -0.140489, 0.058926, 0.128758, 0.128814, 0.14242, 0.140013]
[-0.208137, -0.206691, -0.266791, -0.262729, -0.18473, 0.023341, 0.023405, 0.042728, 0.039605]
[-0.086045, -0.085256, 0.022683, 0.025286, 0.246328, 0.215005, 0.215043, 0.224379, 0.222578]

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM