I have a text file which is rather simple, I want to read this using numpy.I need to read the numbers in the rows with more than 2 columns where the line doesn't start with a "#".
12
C 0.000000 0.000000 0.000000
C 0.000000 0.000000 1.400000
C 1.212436 0.000000 2.100000
C 2.424871 0.000000 1.400000
C 2.424871 0.000000 0.000000
C 1.212436 0.000000 -0.700000
H -0.943102 0.000000 1.944500
H 1.212436 0.000000 3.189000
H 3.367973 0.000000 1.944500
H 3.367973 0.000000 -0.544500
H 1.212436 0.000000 -1.789000
H -0.943102 0.000000 -0.544500
I have tried the following code:
import numpy as np
class mol:
import numpy as np
class mol:
def __init__(self):
self.masses = {'H': 1, 'D': 2, 'C': 12, 'O': 16}
def read_xyz(self, filename):
self.filename = filename
with open(self.filename) as f:
for line in f:
if not line.startswith("#") and len(line.split())>3:
print np.loadtxt(line)
if __name__ == "__main__":
test = mol()
test.read_xyz('benz.xyz')
but my code crashes and if I print the line I get an empty line between the each row I dunno why. Any help will be great!
I would suggest you to use a regex instead, something like:
import numpy as np
class mol:
def __init__(self):
self.masses = {'H': 1, 'D': 2, 'C': 12, 'O': 16}
def read_xyz(self, filename):
self.filename = filename
regexp = r'\s+\w+' + r'\s+([-.0-9]+)' * 3 + r'\s*\n'
data = np.fromregex(self.filename, regexp, dtype='f')
print(data)
if __name__ == "__main__":
test = mol()
test.read_xyz('benz.xyz')
In this case, I obtained:
[[ 0. 0. 0. ]
[ 0. 0. 1.4 ]
[ 1.212436 0. 2.1 ]
[ 2.424871 0. 1.4 ]
[ 2.424871 0. 0. ]
[ 1.212436 0. -0.7 ]
[-0.943102 0. 1.9445 ]
[ 1.212436 0. 3.189 ]
[ 3.367973 0. 1.9445 ]
[ 3.367973 0. -0.5445 ]
[ 1.212436 0. -1.789 ]
[-0.943102 0. -0.5445 ]]
You need to modify the regex if you want to keep the first column with the character.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.