简体   繁体   English

使用numpy从文件中读取字符串中的浮点数

[英]Reading floats inside strings from a file using numpy

I have a text file like this:我有一个这样的文本文件:

"-3.588920831680E-02","1.601887196302E-01","1.302309112549E+02"
"3.739478886127E-01","1.782759875059E-01","6.490543365479E+01"
"3.298096954823E-01","6.939357519150E-02","2.112392578125E+02"
"-2.319437451661E-02","1.149862855673E-01","2.712340698242E+02"
"-1.015115305781E-01","-1.082316488028E-01","6.532022094727E+01"
"-5.374089814723E-03","1.031072884798E-01","5.510117187500E+02"
"6.748274713755E-02","1.679160743952E-01","4.033969116211E+02"
"1.027429699898E-01","1.379162818193E-02","2.374352874756E+02"
"-1.371455192566E-01","1.483036130667E-01","2.703260498047E+02"
"NULL","NULL","NULL"
"3.968210220337E-01","1.893606968224E-02","2.803018188477E+01"

I tried to read this textfile using numpy as:我尝试使用 numpy 读取此文本文件:

dat = np.genfromtxt('data.txt',delimiter=',',dtype='str')
print("dat = {}".format(dat))

# now when I try to convert to float
dat = dat.astype(np.float) # it fails

# try to make it float
dat = np.char.strip(dat, '"').astype(float)
 File "test.py", line 25, in <module> dat = dat.astype(np.float) # it fails ValueError: could not convert string to float: '"-3.588920831680E-02"'

How can I fix this error?我该如何解决这个错误?

Related links:相关链接:

https://docs.scipy.org/doc/numpy/reference/generated/numpy.genfromtxt.html#numpy.genfromtxt https://docs.scipy.org/doc/numpy/reference/generated/numpy.genfromtxt.html#numpy.genfromtxt

You can read that file directly using the csv module like:您可以使用csv模块直接读取该文件,例如:

Code:代码:

import csv
import numpy as np

reader = csv.reader(open('file1'), delimiter=",")
data = np.array([[float(i) if i != 'NULL' else np.nan for i in row]
                  for row in reader])

print(data)

Results:结果:

[[ -3.58892083e-02   1.60188720e-01   1.30230911e+02]
 [  3.73947889e-01   1.78275988e-01   6.49054337e+01]
 [  3.29809695e-01   6.93935752e-02   2.11239258e+02]
 [ -2.31943745e-02   1.14986286e-01   2.71234070e+02]
 [ -1.01511531e-01  -1.08231649e-01   6.53202209e+01]
 [ -5.37408981e-03   1.03107288e-01   5.51011719e+02]
 [  6.74827471e-02   1.67916074e-01   4.03396912e+02]
 [  1.02742970e-01   1.37916282e-02   2.37435287e+02]
 [ -1.37145519e-01   1.48303613e-01   2.70326050e+02]
 [             nan              nan              nan]
 [  3.96821022e-01   1.89360697e-02   2.80301819e+01]]

The problem is that your floating point number is being enclosed by 2 quotes instead of 1. Numpy wants your array to have strings like问题是你的浮点数被 2 个引号而不是 1 括起来。Numpy 希望你的数组有这样的字符串

'1.45E-02'

Instead you have something like相反,你有类似的东西

' "1.45E-02" ' (Note the extra double quotes at the beginning and end). ' "1.45E-02" ' (注意开头和结尾的额外双引号)。

So the solution to this problem will be simply to remove those extra double quotes which can be done quite easily as follows:所以这个问题的解决方案是简单地删除那些额外的双引号,这可以很容易地完成,如下所示:

dat_new = np.char.replace(dat,'"','')
dat_new = np.char.replace(dat_new,'NULL','0') #You also need to do something 
#with NULL. Here I am just replacing it with 0.
dat_new = dat_new.astype(float)

np.char.replace(np_array,string_to_replace,replacement) essentially works as 'Find and Replace' and replaces each instance of your second argument with the third. np.char.replace(np_array,string_to_replace,replacement)本质上用作“查找和替换”,并将第二个参数的每个实例替换为第三个。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM