I have a CSV file, which has one column with image data. Before saving to CSV each image was a 3D numpy array. So each cell of this column was a 3D array. After saving to CSV and reading using pandas they converted to string. Now I want to recreate an array from them. Below you can find a sample of string which I want to convert to 3D numpy array.
import numpy as np
my_string_array = str(np.random.randint(0, high=255, size=(51, 52, 3)))
I tried the staff described here how to read numpy 2D array from string? , but seems that I need to have something different, since I have 3D array.
I know that if the arrays were converted to list
before saving to CSV, then
import ast
my_array = np.array(ast.literal_eval(my_string_array))
would work, but unfortunately this is not the case. After running this I got an error:
Traceback (most recent call last):
File "/opt/lyp-venv/lib/python3.7/site-packages/IPython/core/interactiveshell.py", line 3319, in run_code
exec(code_obj, self.user_global_ns, self.user_ns)
File "<ipython-input-25-3e5a6dae7682>", line 2, in <module>
my_array = np.array(ast.literal_eval(my_string_array))
File "/usr/lib/python3.7/ast.py", line 46, in literal_eval
node_or_string = parse(node_or_string, mode='eval')
File "/usr/lib/python3.7/ast.py", line 35, in parse
return compile(source, filename, mode, PyCF_ONLY_AST)
File "<unknown>", line 1
[[[205 60 145]
^
SyntaxError: invalid syntax
Regarding the error that you added:
ast.literal_eval(my_string_array)
....
[[[205 60 145]
^
SyntaxError: invalid syntax
literal_eval
works on a limited subset of Python syntax. For example it will work on a valid list input, eg "[[205, 60, 145]]"
. But the string in the error message does not match that; it's missing the commas. The str(an_array)
omits the commas. str(an_array.tolist())
does not.
Most of the answers that deal with loading csv
files like this stress that you will need to replace the spaces (or blank delimiters) with commas.
So in this case the error has nothing to do with the array being 3d.
Let me illustrate:
make 3d array:
In [720]: arr = np.arange(24).reshape(2,3,4)
In [722]: arr
Out[722]:
array([[[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11]],
[[12, 13, 14, 15],
[16, 17, 18, 19],
[20, 21, 22, 23]]])
It's str
representation, which is probably what pandas
writes to the csv:
In [723]: str(arr)
Out[723]: '[[[ 0 1 2 3]\n [ 4 5 6 7]\n [ 8 9 10 11]]\n\n [[12 13 14 15]\n [16 17 18 19]\n [20 21 22 23]]]'
Compare that with what a list str looks like:
In [724]: arr.tolist()
Out[724]:
[[[0, 1, 2, 3], [4, 5, 6, 7], [8, 9, 10, 11]],
[[12, 13, 14, 15], [16, 17, 18, 19], [20, 21, 22, 23]]]
In [725]: str(arr.tolist())
Out[725]: '[[[0, 1, 2, 3], [4, 5, 6, 7], [8, 9, 10, 11]], [[12, 13, 14, 15], [16, 17, 18, 19], [20, 21, 22, 23]]]'
literal_eval
has no problem with this triple nested list string:
In [726]: ast.literal_eval(_)
Out[726]:
[[[0, 1, 2, 3], [4, 5, 6, 7], [8, 9, 10, 11]],
[[12, 13, 14, 15], [16, 17, 18, 19], [20, 21, 22, 23]]]
literal_eval
applied to the array string produces your error:
In [727]: ast.literal_eval(Out[721])
Traceback (most recent call last):
File "/usr/local/lib/python3.6/dist-packages/IPython/core/interactiveshell.py", line 3319, in run_code
exec(code_obj, self.user_global_ns, self.user_ns)
File "<ipython-input-727-700e3f960e29>", line 1, in <module>
ast.literal_eval(Out[721])
File "/usr/lib/python3.6/ast.py", line 48, in literal_eval
node_or_string = parse(node_or_string, mode='eval')
File "/usr/lib/python3.6/ast.py", line 35, in parse
return compile(source, filename, mode, PyCF_ONLY_AST)
File "<unknown>", line 1
[[[ 0 1 2 3]
^
SyntaxError: invalid syntax
I might be able to fix that with a couple of string substitutions, effectively converting Out[721]
to Out[725]
.
@Mad pointed out that if the array is large enough (over 1000 elements) str
will produce a condensed version, replacing a lot of the values with '...'. You can verify that yourself. If that is the case, no amount of string editing will fix the problem. That string is useless.
In how to read numpy 2D array from string? , my answer has limited values since you already have string. https://stackoverflow.com/a/44323021/901925 is better. I've also SO questions that deal specifically with the strings that appear in pandas
csv. In any case you need to pay attention to the details of the string, especially delimiters and special characters.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.