简体   繁体   中英

numpy array saving to csv

I'm trying to save numpy array to csv file but there is a problem,

I use two different solution but they did not work

my numpy array looks like,

In[39]: arr[0]
Out[39]: 
array([ array([[ 30,  29, 198, ..., 149, 149, 149],
   [ 29,  29, 197, ..., 149, 149, 149],
   [ 29,  29, 197, ..., 149, 149, 149],
   ..., 
   [ 63,  63,  96, ..., 105, 104, 104],
   [ 63,  63,  96, ..., 106, 105, 105],
   [ 77,  77, 217, ..., 217, 217, 217]], dtype=uint8),
   list([0, 0, 0, 0, 0, 0, 0, 0, 0])], dtype=object)

Its shape is (1200, 2) numpy array and I want to save it to csv file,

with np.savetxt function

In[40]: np.savetxt("numpy_array.csv", arr, delimiter=',')
Traceback (most recent call last):
  File "D:\Program files\Anaconda3\lib\site-packages\numpy\lib\npyio.py", line 1254, in savetxt
    fh.write(asbytes(format % tuple(row) + newline))
TypeError: only length-1 arrays can be converted to Python scalars
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
  File "D:\Program files\Anaconda3\lib\site-packages\IPython\core\interactiveshell.py", line 2862, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)
  File "<ipython-input-41-673bcc1d77a6>", line 1, in <module>
    np.savetxt("numpy_array.csv", arr, delimiter=',')
  File "D:\Program files\Anaconda3\lib\site-packages\numpy\lib\npyio.py", line 1258, in savetxt
    % (str(X.dtype), format))
TypeError: Mismatch between array dtype ('object') and format specifier ('%.18e,%.18e')

with pandas

In[42]: df = pd.DataFrame(arr)
In[43]: df[:5]
Out[43]: 
                                                   0  \
0  [[30, 29, 198, 198, 197, 197, 197, 197, 197, 1...   
1  [[29, 29, 197, 197, 196, 196, 197, 197, 197, 1...   
2  [[29, 29, 196, 196, 196, 196, 196, 196, 196, 1...   
3  [[29, 29, 196, 196, 196, 196, 196, 196, 196, 1...   
4  [[29, 29, 196, 196, 196, 196, 196, 196, 197, 1...   
                             1  
0  [0, 0, 0, 0, 0, 0, 0, 0, 0]  
1  [1, 0, 0, 0, 0, 0, 0, 0, 0]  
2  [1, 0, 0, 0, 0, 0, 0, 0, 0]  
3  [1, 0, 0, 0, 0, 0, 0, 0, 0]  
4  [1, 0, 0, 0, 0, 0, 0, 0, 0]  
In[44]: df.to_csv("h.csv", index=False)
In[45]: a = pd.read_csv("h.csv", header=None,names =['input', 'output'])
In[46]: a[:5]
Out[46]: 
                                               input  \
0                                                  0   
1  [[ 30  29 198 ..., 149 149 149]\r\n [ 29  29 1...   
2  [[ 29  29 197 ..., 149 149 149]\r\n [ 29  29 1...   
3  [[ 29  29 196 ..., 149 149 149]\r\n [ 29  29 1...   
4  [[ 29  29 196 ..., 149 149 149]\r\n [ 29  29 1...   
                        output  
0                            1  
1  [0, 0, 0, 0, 0, 0, 0, 0, 0]  
2  [1, 0, 0, 0, 0, 0, 0, 0, 0]  
3  [1, 0, 0, 0, 0, 0, 0, 0, 0]  
4  [1, 0, 0, 0, 0, 0, 0, 0, 0]  

when I print "df[:5]", everything looks great, but after I saved it to csv then read it from csv, it looks awful, there are not commas between numbers and there are '\\r\\n' between list.

I want to see like "df[:5]" 's output after read csv file, how can I do it, what is the problem?

Numpy itself has no 'save as csv'-function. Normally you save it through another package (like pandas or pickle).

What you see 'it looks awful' is the pandas format. Add arr = np.array(a) and you have you numpy format again.

Your array is 2d, (1200, 2) with object dtype. Evidently the first column contains 2d arrays, and the 2nd column lists.

arr[0,0] is a 2d array

array([[ 30,  29, 198, ..., 149, 149, 149],
   [ 29,  29, 197, ..., 149, 149, 149],
   [ 29,  29, 197, ..., 149, 149, 149],
   ..., 
   [ 63,  63,  96, ..., 105, 104, 104],
   [ 63,  63,  96, ..., 106, 105, 105],
   [ 77,  77, 217, ..., 217, 217, 217]], dtype=uint8)

You could easily write in a csv format. For example:

In [342]: arr = np.array([[ 30,  29, 198, 149, 149, 149],
     ...:    [ 29,  29, 197, 149, 149, 149],
     ...:    [ 29,  29, 197, 149, 149, 149],
     ...:    [ 63,  63,  96, 105, 104, 104],
     ...:    [ 63,  63,  96, 106, 105, 105],
     ...:    [ 77,  77, 217, 217, 217, 217]], dtype=np.uint8)
     ...:    
     ...:    
In [343]: np.savetxt('arr.txt', arr, delimiter=',', fmt='%4d')

produces a file that looks like:

In [344]: cat arr.txt
  30,  29, 198, 149, 149, 149
  29,  29, 197, 149, 149, 149
  29,  29, 197, 149, 149, 149
  63,  63,  96, 105, 104, 104
  63,  63,  96, 106, 105, 105
  77,  77, 217, 217, 217, 217

Read savetxt for more details on fmt .

But the full array is not compatible with the simple 2d layout of a csv file. Sure you could write something more complicated, but you couldn't load it with a csv reader like np.genfromtxt or np.loadtxt . Those expect the neat row and column layout with a well defined delimiter.

In [346]: data = np.genfromtxt('arr.txt',delimiter=',',dtype=None)
In [347]: data
Out[347]: 
array([[ 30,  29, 198, 149, 149, 149],
       [ 29,  29, 197, 149, 149, 149],
       [ 29,  29, 197, 149, 149, 149],
       [ 63,  63,  96, 105, 104, 104],
       [ 63,  63,  96, 106, 105, 105],
       [ 77,  77, 217, 217, 217, 217]])

The pandas df shows two columns, one with the arrays, the other with the lists. But in a column 0 appears to contain string representations of the 2d arrays, as indicated by the newline characters. Did you look at the h.csv file? Part of the reason for using csv is so people can read it, and other programs (like excel) can read it.


Make an array like your big one

In [349]: barr = np.empty((3,2), object)
In [350]: barr[:,0]=[arr,arr,arr]
In [351]: barr[:,1]=[[0,0,0] for _ in range(3)]
In [352]: barr
Out[352]: 
array([[array([[ 30,  29, 198, 149, 149, 149],
       [ 29,  29, 197, 149, 149, 149],
       [ 29,  29, 197, 149, 149, 149],
       [ 63,  63,  96, 105, 104, 104],
       [ 63,  63,  96, 106, 105, 105],
       [ 77,  77, 217, 217, 217, 217]], dtype=uint8),
        list([0, 0, 0])],
       [array([[ 30,  29, 198, 149, 149, 149],
   ...
       [ 77,  77, 217, 217, 217, 217]], dtype=uint8),
        list([0, 0, 0])]], dtype=object)

Write it %s format, the only one that will work with objects like this:

In [354]: np.savetxt('barr.txt',barr, delimiter=',',fmt='%s')
In [355]: cat barr.txt
[[ 30  29 198 149 149 149]
 [ 29  29 197 149 149 149]
 [ 29  29 197 149 149 149]
 [ 63  63  96 105 104 104]
 [ 63  63  96 106 105 105]
 [ 77  77 217 217 217 217]],[0, 0, 0]
[[ 30  29 198 149 149 149]
 [ 29  29 197 149 149 149]
 [ 29  29 197 149 149 149]
 [ 63  63  96 105 104 104]
 [ 63  63  96 106 105 105]
 [ 77  77 217 217 217 217]],[0, 0, 0]
[[ 30  29 198 149 149 149]
 [ 29  29 197 149 149 149]
 [ 29  29 197 149 149 149]
 [ 63  63  96 105 104 104]
 [ 63  63  96 106 105 105]
 [ 77  77 217 217 217 217]],[0, 0, 0]

That is not a valid csv file. It is text, but with [] and varying line lengths, none of the standard csv file readers can handle it.


Saving that array as you did with pandas, I get:

In [364]: cat pdbarr.txt
0,1
"[[ 30  29 198 149 149 149]
 [ 29  29 197 149 149 149]
 [ 29  29 197 149 149 149]
 [ 63  63  96 105 104 104]
 [ 63  63  96 106 105 105]
 [ 77  77 217 217 217 217]]","[0, 0, 0]"
"[[ 30  29 198 149 149 149]
 [ 29  29 197 149 149 149]
 [ 29  29 197 149 149 149]
 [ 63  63  96 105 104 104]
 [ 63  63  96 106 105 105]
 [ 77  77 217 217 217 217]]","[0, 0, 0]"
"[[ 30  29 198 149 149 149]
 [ 29  29 197 149 149 149]
 [ 29  29 197 149 149 149]
 [ 63  63  96 105 104 104]
 [ 63  63  96 106 105 105]
 [ 77  77 217 217 217 217]]","[0, 0, 0]"

Notice all the quotes - it's writing those component arrays and lists as strings. Again, not a valid csv .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM