I am trying to load large text data with to numpy arrays. Numpy's loadtxt and genfromtxt didn't work for as ,
['#','!','C']
n*value
where n
is an integer number of repeats and value
is the float data. Hence I try to read the text file using readlines()
, then use Numpy's loadtxt
to convert data to Numpy arrays.
For reading and replacements, I tried to use regular expressions ( re
module) but couldn't get it working. However the following Python code is working. My question is what is the most efficient and Pythonic way of doing this?
If RegEx, what is the correct regex code for following find and replace in readlines()
list object:
lines = ['1 2 3*2.5 3 6 1*.3 8 \n', '! comment here\n', '1*1 2.0 2*2.1 3 6 0 8 \n']
for l, line in enumerate(lines):
if line.strip() == '' or line.strip()[0] in ['#','!','C']:
del lines[l]
for l, line in enumerate(lines):
repls = [word for word in line.strip().split() if word.find('*')>=0]
print repls
for repl in repls:
print repl
line = line.replace(repl, ' '.join([repl.split('*')[1] for n in xrange(int(repl.split('*')[0]))]))
lines[l] = line
print lines
The output is following:
['1 2 2.5 2.5 2.5 3 6 .3 8 \n', '1 2.0 2.1 2.1 3 6 0 8 \n']
Upon comments, I edited my Python codes as follow:
in_lines = ['1 2 3*2.5 3 6 1*.3 8 \n', '! comment here\n', '1*1 2.0 2*2.1 3 6 0 8 \n']
lines = []
for line in in_lines:
if line.strip() == '' or line.strip()[0] in ['#','!','C']:
continue
else:
repls = [word for word in line.strip().split() if word.find('*')>=0]
for repl in repls:
line = line.replace(repl, ' '.join([float(repl.split('*')[1]) for n in xrange(int(repl.split('*')[0]))]))
lines.append(line)
print lines
Use python's awesome functional features and list comprehension instead:
#!/usr/bin/env python
lines = ['1 2 3*2.5 3 6 1*.3 8 \n', '! comment here\n', '1*1 2.0 2*2.1 3 6 0 8 \n']
#filter out comments
lines = [line for line in lines if line.strip() != '' and line.strip()[0] not in ['#','!','C']]
#turns lines into lists of tokens
lines = [[word for word in line.strip().split()] for line in lines]
# turns a list of strings into a number generator, parsing '*' properly
def generate_numbers(tokens):
for token in tokens:
if '*' in token:
n,m = token.split("*")
for i in range(int(n)):
yield float(m)
else:
yield float(token)
# use the generator to clean up the lines
lines = [list(generate_numbers(tokens)) for tokens in lines]
print lines
Outputs:
➤ ./try.py
[[1.0, 2.0, 2.5, 2.5, 2.5, 3.0, 6.0, 0.3, 8.0], [1.0, 2.0, 2.1, 2.1, 3.0, 6.0, 0.0, 8.0]]
This solution uses generators instead of lists so that you don't have to load your entire file in memory. Note the use of two idioms:
with open("name") as file
This will clean up your file handle after you exit the block.
for line in file
This will iterate over the lines in the file using a generator without loading up the entire file in memory.
This gives us:
#!/usr/bin/env python
# turns a list of strings into a number generator, parsing '*' properly
def generate_numbers(tokens):
for token in tokens:
if '*' in token:
n,m = token.split("*")
for i in range(int(n)):
yield float(m)
else:
yield float(token)
# Pull this out to make the code more readable
def not_comment(line):
return line.strip() != '' and line.strip()[0] not in ['#','!','C']
with open("try.dat") as file:
lines = (
list(generate_numbers((word for word in line.strip().split())))
for line in file if not_comment(line)
) # lines is a lazy generator
for line in lines:
print line
Output:
➤ ./try.py
[1.0, 2.0, 2.5, 2.5, 2.5, 3.0, 6.0, 0.3, 8.0]
[1.0, 2.0, 2.1, 2.1, 3.0, 6.0, 0.0, 8.0]
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.