简体   繁体   中英

Delimiter error from np.loadtxt Python

I'm trying to sum some values in a list so i loaded the .dat file that contains the values, but the only way Python makes the sum of the data is by separate it with ','. Now, this is what I get.

    altura = np.loadtxt("bio.dat",delimiter=',',usecols=(5,),dtype='float')
  File "/usr/lib/python2.7/dist-packages/numpy/lib/npyio.py", line 846, in loadtxt
    vals = [vals[i] for i in usecols]
IndexError: list index out of range

This is my code

import numpy as np

altura = np.loadtxt("bio.dat",delimiter=',',usecols=(5,),dtype='str')
print altura

And this is the file 'bio.dat'

1 Katherine Oquendo M 18    1.50    50  
2 Pablo Restrepo    H 20    1.83    79  
3 Ana Agudelo   M 18    1.58    45  
4 Kevin Vargas  H 20    1.74    80  
5 Pablo madrid  H 20    1.70    55  

What I intend to do is

x=sum(altura)

What should i do with the 'separate'?

In my case, some line includes # character.
Then numpy will ignore all the rests of the line, because that means 'comment'.
So try again with comments parameter like

altura = np.loadtxt("bio.dat",delimiter=',',usecols=(5,),dtype=‘str’,comments=‘')

And I recommend you not to use np.loadtxt . Because it's incredibly slow if you must process a large(>1M lines) file.

The file doesn't need to be comma separated. Here's my sample run, using StringIO to simulate a file. I assume you want to sum the numbers that look a person's height (in meters).

In [17]: from StringIO import StringIO
In [18]: s="""\
1 Katherine Oquendo M 18    1.50    50  
2 Pablo Restrepo    H 20    1.83    79  
3 Ana Agudelo   M 18    1.58    45  
4 Kevin Vargas  H 20    1.74    80  
5 Pablo madrid  H 20    1.70    55  
"""
In [19]: S=StringIO(s)
In [20]: data=np.loadtxt(S,dtype=float,usecols=(5,))
In [21]: data
Out[21]: array([ 1.5 ,  1.83,  1.58,  1.74,  1.7 ])
In [22]: np.sum(data)
Out[22]: 8.3499999999999996

as script (with the data in a .txt file)

import numpy as np
fname = 'stack25828405.txt'
data=np.loadtxt(fname,dtype=float,usecols=(5,))
print data
print np.sum(data)

2119:~/mypy$ python2.7 stack25828405.py
[ 1.5   1.83  1.58  1.74  1.7 ]
8.35

Alternatively, you can convert the tab delimited file to csv first.

csv supports tab delimited files. Supply the delimiter argument to reader :

import csv

txt_file = r"mytxt.txt"
csv_file = r"mycsv.csv"

# use 'with' if the program isn't going to immediately terminate
# so you don't leave files open
# the 'b' is necessary on Windows
# it prevents \x1a, Ctrl-z, from ending the stream prematurely
# and also stops Python converting to / from different line terminators
# On other platforms, it has no effect
in_txt = csv.reader(open(txt_file, "rb"), delimiter = '\t')
out_csv = csv.writer(open(csv_file, 'wb'))

out_csv.writerows(in_txt)

This answer is not my work; it is the work of agf found at https://stackoverflow.com/a/10220428/3767980 .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM