I am currently writing an array of dictionaries like below to a csv File:
tmp_res = [{"val1": 1.0, "val2": 2, "ar_1": [[65.41156005859375, 53.709598541259766], [251.97698974609375, 153.14926147460938]] },....]
ar1
represents an *ndarray*
of arbitrary length [-1,2]
and -1
is not constant in the Dicts.
After reading I get the single values of val1
and val2
as supposed however the Array is not easily readable.
"[[65.41156005859375, 53.709598541259766], [251.97698974609375, 153.14926147460938]]"
I know I could work through that string and seperate it by some characters. However it feels like there should be an better and more elegant solution to this.way to solve this problem.
What is the best way to save such Data to a file and restore it?
EDIT: To clarify my saving and reading of the file. I am saving my file via a csv.DictWriter
in the following way:
# Exemplary Data:
results = [{'mean_iou': 0.3319194248978337, 'num_boxes': 1, 'centroids': [[101.21826171875, 72.79462432861328]]}, {'mean_iou': 0.4617333142965009, 'num_boxes': 2, 'centroids': [[65.41156005859375, 53.709598541259766], [251.97698974609375, 153.14926147460938]]}, {'mean_iou': 0.537150158582514, 'num_boxes': 3, 'centroids': [[50.82071304321289, 42.616580963134766], [304.91583251953125, 176.09994506835938], [140.43699645996094, 104.00206756591797]]}]
# The given results data is basically tmp_res after the for loop.
tmp_res = []
for i in range(0, len(results):
res_dict = {}
res_dict["centroids"] = results[i]["centroids"]
res_dict["mean_iou"] = results[i]["mean_iou"]
res_dict["num_boxes"] = results[i]["num_boxes"]
tmp_res.append(res_dict)
# Writing to File
keys = tmp_res[0].keys()
with open('anchor.csv','w+') as output_file:
dict_writer = csv.DictWriter(output_file, keys)
dict_writer.writeheader()
dict_writer.writerows(tmp_res)
# Reading from File
num_centroids = []
mean_ious = []
centroids = []
reader = csv.DictReader(csvfile,
fieldnames=["mean_iou",
"num_boxes",
"centroids"])
# Skipping line of the header
next(reader, None)
for row in reader:
centroids.append(row["centroids"])
num_centroids.append(row["num_boxes"])
mean_ious.append(row["mean_iou"])
An exerpt from the file looks like follows:
mean_iou,num_boxes,centroids
0.3319194248978337,1,"[[101.21826171875, 72.79462432861328]]"
0.4617333142965009,2,"[[65.41156005859375, 53.709598541259766], [251.97698974609375, 153.14926147460938]]"
0.537150158582514,3,"[[50.82071304321289, 42.616580963134766], [304.91583251953125, 176.09994506835938], [140.43699645996094, 104.00206756591797]]"
0.5602804262309611,4,"[[49.9361572265625, 41.09553146362305], [306.10711669921875, 177.09762573242188], [88.86656188964844, 167.8087921142578], [151.82627868652344, 81.80717468261719]]"
I suspect that the csv.DictWriter doesn't know how to handle an array of multiple values, since it contains an Comma , which would break the format of the comma seperated values. Therefore it wraps the Data into a string to avoid the conflict in strucutre.
While reading through Serges answer and your comments I think that using a JSON structure instead of CSV is more functional for what I am looking. It supports the structures I am looking for quite easily.
However I thought the csv.dictWriter
would be able to handle some sort of unwrapping of its own "to-string-wrapped" data.
Also sorry for the delay.
Solution: Solution from Serge applied in the code:
#Added Json
import json
# Reading from File
num_centroids = []
mean_ious = []
centroids = []
reader = csv.DictReader(csvfile,fieldnames=["mean_iou",
"num_boxes",
"centroids"])
# Skipping line of the header
next(reader, None)
for row in reader:
centroids.append(json.loads(row["centroids"]))
num_centroids.append(row["num_boxes"])
mean_ious.append(row["mean_iou"])
Your file is not in csv format, it is just a python dictionary. Just read file into a string and use eval
statement (dangerous but easy) or write custom Parser, say, break the string onto array, remove comas and brackets, apply np.fromstring then reshape.
Curiosly "[[65.41156005859375, 53.709598541259766], ..."
seems like a valid json, so np.array( json.loads ( "[[65.41156005859375, 53.709598541259766], [251.97698974609375, 153.14926147460938]]" ))
should result in a ndarray. Mind that tmp_res =
is not valid json, so json.load('myfile')
will fail
PS. CSV is intended for tabular data only, not multidimential data. If you must you can do double csv with standard csv and split
s = "[[76 ... "
lines = s.split(']], [[')
reader = csv.reader(lines, delimiter=', ')
or use panda from_csv you can define ]], [[ as lineseparator in C mode.
I guess, a better solution is storing the data in valid json (without any assignments). Or you can try to use designated numpy.save numpy.load
to store binary data for greater scalability.
For other viable alternatives read
How can I serialize a numpy array while preserving matrix dimensions?
PS. CSV is intended to be used for tabular data, not arbitrary multidimentonal data, so it is just poor choise here. Nevertheless if you must, you can use double csv reader, though it looks ugly
text = "[[6...
lines = text.split("]], [[")
reader2 = csv.reader(lines, delimiter=', ')
...
or you can tinker with pandas csv reader it even has custom line delimiter. Perhaps some more powerful csv libraries would work better.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.