My data file looks like this
#weight, height and gender
45 145 f
89 154 m
56 163 m
-1 165 f
65 175 m
-1 125 m
65 169 f
as you can see that for 2 entries i have weight as -1, these are outliers and i want to remove them. that is remove that entry that is outlier. So i try to read this file using numpy, as in np.loadtxt, so the code for it goes like
data = np.loadtxt('whData.dat',dtype=np.object,comments='#',delimiter=None)
X = data[:,0:2].astype(np.float)
y = data[:,2]
X = X.T
...
in order to remove the outlier i define a function that iterates the data and returns a new data that has no outliers.
def remove_outlier2(data):
non_outlier = []
for x in data:
if x[0] != '-1':
non_outlier.append(x)
return non_outlier
and i call this after loading the data from file, that is
data = np.loadtxt('whData.dat',dtype=np.object,comments='#',delimiter=None)
data = remove_outlier2(data)
np.asarray(data)
X = data[:,0:2].astype(np.float)
y = data[:,2]
X = X.T
...
But now i get this error, which i am not able to resolve.
Traceback (most recent call last):
File "<ipython-input-2-2aec95447a79>", line 1, in <module>
runfile('C:/Users/xxx/py_workspace/pattern/whExample.py', wdir='C:/Users/xxx/py_workspace/pattern')
File "C:\Users\xxx\AppData\Local\Continuum\Anaconda2\lib\site-packages\spyderlib\widgets\externalshell\sitecustomize.py", line 699, in runfile
execfile(filename, namespace)
File "C:\Users\xxx\AppData\Local\Continuum\Anaconda2\lib\site-packages\spyderlib\widgets\externalshell\sitecustomize.py", line 74, in execfile
exec(compile(scripttext, filename, 'exec'), glob, loc)
File "C:/Users/xxx/py_workspace/pattern/whExample.py", line 79, in <module>
X = data[:,0:2].astype(np.float)
TypeError: list indices must be integers, not tuple
I also tried to print the data just after reading it from file, and it looks like this in Spyder
[['45' '145' 'f']
['89' '154' 'm']
['56' '163' 'm']
['-1' '165' 'f']
['65' '175' 'm']
['-1' '125' 'm']
['65' '169' 'f']]
I tried to google and find out what i am doing wrong but couldn't figure out. How can i resolve this?
Thanks
So finally from suggestions in the comment section, all i had to do is use the output of np.asarray(), that is
data = np.loadtxt('whData.dat',dtype=np.object,comments='#',delimiter=None)
# reomve outliers
data = remove_outlier2(data)
data = np.asarray(data)
and things worked fine.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.