简体   繁体   中英

Reading floats inside strings from a file using numpy

I have a text file like this:

"-3.588920831680E-02","1.601887196302E-01","1.302309112549E+02"
"3.739478886127E-01","1.782759875059E-01","6.490543365479E+01"
"3.298096954823E-01","6.939357519150E-02","2.112392578125E+02"
"-2.319437451661E-02","1.149862855673E-01","2.712340698242E+02"
"-1.015115305781E-01","-1.082316488028E-01","6.532022094727E+01"
"-5.374089814723E-03","1.031072884798E-01","5.510117187500E+02"
"6.748274713755E-02","1.679160743952E-01","4.033969116211E+02"
"1.027429699898E-01","1.379162818193E-02","2.374352874756E+02"
"-1.371455192566E-01","1.483036130667E-01","2.703260498047E+02"
"NULL","NULL","NULL"
"3.968210220337E-01","1.893606968224E-02","2.803018188477E+01"

I tried to read this textfile using numpy as:

dat = np.genfromtxt('data.txt',delimiter=',',dtype='str')
print("dat = {}".format(dat))

# now when I try to convert to float
dat = dat.astype(np.float) # it fails

# try to make it float
dat = np.char.strip(dat, '"').astype(float)
 File "test.py", line 25, in <module> dat = dat.astype(np.float) # it fails ValueError: could not convert string to float: '"-3.588920831680E-02"'

How can I fix this error?

Related links:

https://docs.scipy.org/doc/numpy/reference/generated/numpy.genfromtxt.html#numpy.genfromtxt

You can read that file directly using the csv module like:

Code:

import csv
import numpy as np

reader = csv.reader(open('file1'), delimiter=",")
data = np.array([[float(i) if i != 'NULL' else np.nan for i in row]
                  for row in reader])

print(data)

Results:

[[ -3.58892083e-02   1.60188720e-01   1.30230911e+02]
 [  3.73947889e-01   1.78275988e-01   6.49054337e+01]
 [  3.29809695e-01   6.93935752e-02   2.11239258e+02]
 [ -2.31943745e-02   1.14986286e-01   2.71234070e+02]
 [ -1.01511531e-01  -1.08231649e-01   6.53202209e+01]
 [ -5.37408981e-03   1.03107288e-01   5.51011719e+02]
 [  6.74827471e-02   1.67916074e-01   4.03396912e+02]
 [  1.02742970e-01   1.37916282e-02   2.37435287e+02]
 [ -1.37145519e-01   1.48303613e-01   2.70326050e+02]
 [             nan              nan              nan]
 [  3.96821022e-01   1.89360697e-02   2.80301819e+01]]

The problem is that your floating point number is being enclosed by 2 quotes instead of 1. Numpy wants your array to have strings like

'1.45E-02'

Instead you have something like

' "1.45E-02" ' (Note the extra double quotes at the beginning and end).

So the solution to this problem will be simply to remove those extra double quotes which can be done quite easily as follows:

dat_new = np.char.replace(dat,'"','')
dat_new = np.char.replace(dat_new,'NULL','0') #You also need to do something 
#with NULL. Here I am just replacing it with 0.
dat_new = dat_new.astype(float)

np.char.replace(np_array,string_to_replace,replacement) essentially works as 'Find and Replace' and replaces each instance of your second argument with the third.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM