Given a sample of data such as this
3,12.2,3.03,2.32,19,96,1.25,.49,.4,.73,5.5,.66,1.83,510
3,12.77,2.39,2.28,19.5,86,1.39,.51,.48,.64,9.899999,.57,1.63,470
3,14.16,2.51,2.48,20,91,1.68,.7,.44,1.24,9.7,.62,1.71,660
3,13.71,5.65,2.45,20.5,95,1.68,.61,.52,1.06,7.7,.64,1.74,740
3,13.4,3.91,2.48,23,102,1.8,.75,.43,1.41,7.3,.7,1.56,750
3,13.27,4.28,2.26,20,120,1.59,.69,.43,1.35,10.2,.59,1.56,835
3,13.17,2.59,2.37,20,120,1.65,.68,.53,1.46,9.3,.6,1.62,840
3,14.13,4.1,2.74,24.5,96,2.05,.76,.56,1.35,9.2,.61,1.6,560
and my code
import numpy as np
with open("wine.txt","r") as f:
stuff=f.readlines()
#np.genfromtxt("wine.txt", delimiter=",")
z=np.empty((0,14),float)
for hello in stuff:
firstbook=hello.strip().split(",")
x=[float(i) for i in firstbook]
y=np.array(x)
b=np.append(b,y)
print b[1:2]
I'm having trouble getting a numpy array that is made out of the entire data set(I'm only getting the last row of the set as the array), such that it would give me the entire column of elements when I print(as in the last line of code). I'm only getting [14.13] when I reach the last line
Why not use np.loadtxt
passing the delimiter as comma :
Load data from a text file. Each row in the text file must have the same number of values.
And your data looks good:
import numpy as np
with open("wine.txt","r") as f:
b = np.loadtxt(f, delimiter=',')
print b[1:2]
# [[3,12.77,2.39,2.28,19.5,86,1.39,.51,.48,.64,9.899999,.57,1.63,470]]
You can use vstack()
import numpy as np
data = '''3,12.2,3.03,2.32,19,96,1.25,.49,.4,.73,5.5,.66,1.83,510
3,12.77,2.39,2.28,19.5,86,1.39,.51,.48,.64,9.899999,.57,1.63,470
3,14.16,2.51,2.48,20,91,1.68,.7,.44,1.24,9.7,.62,1.71,660
3,13.71,5.65,2.45,20.5,95,1.68,.61,.52,1.06,7.7,.64,1.74,740
3,13.4,3.91,2.48,23,102,1.8,.75,.43,1.41,7.3,.7,1.56,750
3,13.27,4.28,2.26,20,120,1.59,.69,.43,1.35,10.2,.59,1.56,835
3,13.17,2.59,2.37,20,120,1.65,.68,.53,1.46,9.3,.6,1.62,840
3,14.13,4.1,2.74,24.5,96,2.05,.76,.56,1.35,9.2,.61,1.6,560'''
stuff = data.split('\n')
z = np.empty((0,14), float)
for hello in stuff:
firstbook = hello.strip().split(",")
x = [float(i) for i in firstbook]
z = np.vstack([z, x])
print(z[1:2])
It is better to accumulate line values in a list, and make an array once.
alist = []
for hello in stuff:
firstbook=hello.strip().split(",")
x=[float(i) for i in firstbook]
alist.append(x)
b = np.array(alist)
Assuming x
has the same number of terms for each line, alist
will be a list of equal length lists. np.array
turns that into a 2d array, just as it does in the prototypical array construction expression:
np.array([[1,2],[3,4]])
Repeated list append is much faster than repeated array stacks/appends.
With your file sample (as a list of lines)
In [1826]: data=np.genfromtxt(txt, dtype=float, delimiter=',')
In [1827]: data
Out[1827]:
array([[ 3.00000000e+00, 1.22000000e+01, 3.03000000e+00,
2.32000000e+00, 1.90000000e+01, 9.60000000e+01,
1.25000000e+00, 4.90000000e-01, 4.00000000e-01,
7.30000000e-01, 5.50000000e+00, 6.60000000e-01,
1.83000000e+00, 5.10000000e+02],
[ 3.00000000e+00, 1.27700000e+01, 2.39000000e+00,
...
1.35000000e+00, 9.20000000e+00, 6.10000000e-01,
1.60000000e+00, 5.60000000e+02]])
In [1828]: data.shape
Out[1828]: (8, 14)
2nd column (as 1d array):
In [1829]: data[:,1]
Out[1829]: array([ 12.2 , 12.77, 14.16, 13.71, 13.4 , 13.27, 13.17, 14.13])
In [1830]: data[:,1:2]
Out[1830]:
array([[ 12.2 ],
[ 12.77],
[ 14.16],
[ 13.71],
[ 13.4 ],
[ 13.27],
[ 13.17],
[ 14.13]])
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.