I'm trying to save numpy array to csv file but there is a problem,
I use two different solution but they did not work
my numpy array looks like,
In[39]: arr[0]
Out[39]:
array([ array([[ 30, 29, 198, ..., 149, 149, 149],
[ 29, 29, 197, ..., 149, 149, 149],
[ 29, 29, 197, ..., 149, 149, 149],
...,
[ 63, 63, 96, ..., 105, 104, 104],
[ 63, 63, 96, ..., 106, 105, 105],
[ 77, 77, 217, ..., 217, 217, 217]], dtype=uint8),
list([0, 0, 0, 0, 0, 0, 0, 0, 0])], dtype=object)
Its shape is (1200, 2) numpy array and I want to save it to csv file,
with np.savetxt function
In[40]: np.savetxt("numpy_array.csv", arr, delimiter=',')
Traceback (most recent call last):
File "D:\Program files\Anaconda3\lib\site-packages\numpy\lib\npyio.py", line 1254, in savetxt
fh.write(asbytes(format % tuple(row) + newline))
TypeError: only length-1 arrays can be converted to Python scalars
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "D:\Program files\Anaconda3\lib\site-packages\IPython\core\interactiveshell.py", line 2862, in run_code
exec(code_obj, self.user_global_ns, self.user_ns)
File "<ipython-input-41-673bcc1d77a6>", line 1, in <module>
np.savetxt("numpy_array.csv", arr, delimiter=',')
File "D:\Program files\Anaconda3\lib\site-packages\numpy\lib\npyio.py", line 1258, in savetxt
% (str(X.dtype), format))
TypeError: Mismatch between array dtype ('object') and format specifier ('%.18e,%.18e')
with pandas
In[42]: df = pd.DataFrame(arr)
In[43]: df[:5]
Out[43]:
0 \
0 [[30, 29, 198, 198, 197, 197, 197, 197, 197, 1...
1 [[29, 29, 197, 197, 196, 196, 197, 197, 197, 1...
2 [[29, 29, 196, 196, 196, 196, 196, 196, 196, 1...
3 [[29, 29, 196, 196, 196, 196, 196, 196, 196, 1...
4 [[29, 29, 196, 196, 196, 196, 196, 196, 197, 1...
1
0 [0, 0, 0, 0, 0, 0, 0, 0, 0]
1 [1, 0, 0, 0, 0, 0, 0, 0, 0]
2 [1, 0, 0, 0, 0, 0, 0, 0, 0]
3 [1, 0, 0, 0, 0, 0, 0, 0, 0]
4 [1, 0, 0, 0, 0, 0, 0, 0, 0]
In[44]: df.to_csv("h.csv", index=False)
In[45]: a = pd.read_csv("h.csv", header=None,names =['input', 'output'])
In[46]: a[:5]
Out[46]:
input \
0 0
1 [[ 30 29 198 ..., 149 149 149]\r\n [ 29 29 1...
2 [[ 29 29 197 ..., 149 149 149]\r\n [ 29 29 1...
3 [[ 29 29 196 ..., 149 149 149]\r\n [ 29 29 1...
4 [[ 29 29 196 ..., 149 149 149]\r\n [ 29 29 1...
output
0 1
1 [0, 0, 0, 0, 0, 0, 0, 0, 0]
2 [1, 0, 0, 0, 0, 0, 0, 0, 0]
3 [1, 0, 0, 0, 0, 0, 0, 0, 0]
4 [1, 0, 0, 0, 0, 0, 0, 0, 0]
when I print "df[:5]", everything looks great, but after I saved it to csv then read it from csv, it looks awful, there are not commas between numbers and there are '\\r\\n' between list.
I want to see like "df[:5]" 's output after read csv file, how can I do it, what is the problem?
Numpy itself has no 'save as csv'-function. Normally you save it through another package (like pandas or pickle).
What you see 'it looks awful' is the pandas format. Add arr = np.array(a)
and you have you numpy format again.
Your array is 2d, (1200, 2) with object dtype. Evidently the first column contains 2d arrays, and the 2nd column lists.
arr[0,0]
is a 2d array
array([[ 30, 29, 198, ..., 149, 149, 149],
[ 29, 29, 197, ..., 149, 149, 149],
[ 29, 29, 197, ..., 149, 149, 149],
...,
[ 63, 63, 96, ..., 105, 104, 104],
[ 63, 63, 96, ..., 106, 105, 105],
[ 77, 77, 217, ..., 217, 217, 217]], dtype=uint8)
You could easily write in a csv format. For example:
In [342]: arr = np.array([[ 30, 29, 198, 149, 149, 149],
...: [ 29, 29, 197, 149, 149, 149],
...: [ 29, 29, 197, 149, 149, 149],
...: [ 63, 63, 96, 105, 104, 104],
...: [ 63, 63, 96, 106, 105, 105],
...: [ 77, 77, 217, 217, 217, 217]], dtype=np.uint8)
...:
...:
In [343]: np.savetxt('arr.txt', arr, delimiter=',', fmt='%4d')
produces a file that looks like:
In [344]: cat arr.txt
30, 29, 198, 149, 149, 149
29, 29, 197, 149, 149, 149
29, 29, 197, 149, 149, 149
63, 63, 96, 105, 104, 104
63, 63, 96, 106, 105, 105
77, 77, 217, 217, 217, 217
Read savetxt
for more details on fmt
.
But the full array is not compatible with the simple 2d layout of a csv
file. Sure you could write something more complicated, but you couldn't load it with a csv
reader like np.genfromtxt
or np.loadtxt
. Those expect the neat row and column layout with a well defined delimiter.
In [346]: data = np.genfromtxt('arr.txt',delimiter=',',dtype=None)
In [347]: data
Out[347]:
array([[ 30, 29, 198, 149, 149, 149],
[ 29, 29, 197, 149, 149, 149],
[ 29, 29, 197, 149, 149, 149],
[ 63, 63, 96, 105, 104, 104],
[ 63, 63, 96, 106, 105, 105],
[ 77, 77, 217, 217, 217, 217]])
The pandas df
shows two columns, one with the arrays, the other with the lists. But in a
column 0 appears to contain string representations of the 2d arrays, as indicated by the newline characters. Did you look at the h.csv
file? Part of the reason for using csv
is so people can read it, and other programs (like excel) can read it.
Make an array like your big one
In [349]: barr = np.empty((3,2), object)
In [350]: barr[:,0]=[arr,arr,arr]
In [351]: barr[:,1]=[[0,0,0] for _ in range(3)]
In [352]: barr
Out[352]:
array([[array([[ 30, 29, 198, 149, 149, 149],
[ 29, 29, 197, 149, 149, 149],
[ 29, 29, 197, 149, 149, 149],
[ 63, 63, 96, 105, 104, 104],
[ 63, 63, 96, 106, 105, 105],
[ 77, 77, 217, 217, 217, 217]], dtype=uint8),
list([0, 0, 0])],
[array([[ 30, 29, 198, 149, 149, 149],
...
[ 77, 77, 217, 217, 217, 217]], dtype=uint8),
list([0, 0, 0])]], dtype=object)
Write it %s
format, the only one that will work with objects like this:
In [354]: np.savetxt('barr.txt',barr, delimiter=',',fmt='%s')
In [355]: cat barr.txt
[[ 30 29 198 149 149 149]
[ 29 29 197 149 149 149]
[ 29 29 197 149 149 149]
[ 63 63 96 105 104 104]
[ 63 63 96 106 105 105]
[ 77 77 217 217 217 217]],[0, 0, 0]
[[ 30 29 198 149 149 149]
[ 29 29 197 149 149 149]
[ 29 29 197 149 149 149]
[ 63 63 96 105 104 104]
[ 63 63 96 106 105 105]
[ 77 77 217 217 217 217]],[0, 0, 0]
[[ 30 29 198 149 149 149]
[ 29 29 197 149 149 149]
[ 29 29 197 149 149 149]
[ 63 63 96 105 104 104]
[ 63 63 96 106 105 105]
[ 77 77 217 217 217 217]],[0, 0, 0]
That is not a valid csv
file. It is text, but with [] and varying line lengths, none of the standard csv
file readers can handle it.
Saving that array as you did with pandas, I get:
In [364]: cat pdbarr.txt
0,1
"[[ 30 29 198 149 149 149]
[ 29 29 197 149 149 149]
[ 29 29 197 149 149 149]
[ 63 63 96 105 104 104]
[ 63 63 96 106 105 105]
[ 77 77 217 217 217 217]]","[0, 0, 0]"
"[[ 30 29 198 149 149 149]
[ 29 29 197 149 149 149]
[ 29 29 197 149 149 149]
[ 63 63 96 105 104 104]
[ 63 63 96 106 105 105]
[ 77 77 217 217 217 217]]","[0, 0, 0]"
"[[ 30 29 198 149 149 149]
[ 29 29 197 149 149 149]
[ 29 29 197 149 149 149]
[ 63 63 96 105 104 104]
[ 63 63 96 106 105 105]
[ 77 77 217 217 217 217]]","[0, 0, 0]"
Notice all the quotes - it's writing those component arrays and lists as strings. Again, not a valid csv
.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.