简体   繁体   中英

Read/write CSV Array of Dicts containing an list of arbitrary length

I am currently writing an array of dictionaries like below to a csv File:

tmp_res = [{"val1": 1.0, "val2": 2, "ar_1": [[65.41156005859375, 53.709598541259766], [251.97698974609375, 153.14926147460938]] },....]

ar1 represents an *ndarray* of arbitrary length [-1,2] and -1 is not constant in the Dicts.

After reading I get the single values of val1 and val2 as supposed however the Array is not easily readable.

"[[65.41156005859375, 53.709598541259766], [251.97698974609375, 153.14926147460938]]"

I know I could work through that string and seperate it by some characters. However it feels like there should be an better and more elegant solution to this.way to solve this problem.

What is the best way to save such Data to a file and restore it?

EDIT: To clarify my saving and reading of the file. I am saving my file via a csv.DictWriter in the following way:


# Exemplary Data:
results = [{'mean_iou': 0.3319194248978337, 'num_boxes': 1, 'centroids': [[101.21826171875, 72.79462432861328]]}, {'mean_iou': 0.4617333142965009, 'num_boxes': 2, 'centroids': [[65.41156005859375, 53.709598541259766], [251.97698974609375, 153.14926147460938]]}, {'mean_iou': 0.537150158582514, 'num_boxes': 3, 'centroids': [[50.82071304321289, 42.616580963134766], [304.91583251953125, 176.09994506835938], [140.43699645996094, 104.00206756591797]]}]

# The given results data is basically tmp_res after the for loop.
tmp_res = []
for i in range(0, len(results):
    res_dict = {}
    res_dict["centroids"] = results[i]["centroids"]
    res_dict["mean_iou"] = results[i]["mean_iou"]
    res_dict["num_boxes"] = results[i]["num_boxes"]
    tmp_res.append(res_dict)

# Writing to File
keys = tmp_res[0].keys()
with open('anchor.csv','w+') as output_file:
    dict_writer = csv.DictWriter(output_file, keys)
    dict_writer.writeheader()
    dict_writer.writerows(tmp_res)

# Reading from File

  num_centroids = []
  mean_ious = []
  centroids = []
  reader = csv.DictReader(csvfile,
                          fieldnames=["mean_iou",
                                      "num_boxes",
                                      "centroids"])
        # Skipping line of the header
        next(reader, None)
        for row in reader:
            centroids.append(row["centroids"])
            num_centroids.append(row["num_boxes"])
            mean_ious.append(row["mean_iou"])

An exerpt from the file looks like follows:

mean_iou,num_boxes,centroids

0.3319194248978337,1,"[[101.21826171875, 72.79462432861328]]"

0.4617333142965009,2,"[[65.41156005859375, 53.709598541259766], [251.97698974609375, 153.14926147460938]]"

0.537150158582514,3,"[[50.82071304321289, 42.616580963134766],  [304.91583251953125, 176.09994506835938], [140.43699645996094, 104.00206756591797]]"

0.5602804262309611,4,"[[49.9361572265625, 41.09553146362305], [306.10711669921875, 177.09762573242188], [88.86656188964844, 167.8087921142578], [151.82627868652344, 81.80717468261719]]"

I suspect that the csv.DictWriter doesn't know how to handle an array of multiple values, since it contains an Comma , which would break the format of the comma seperated values. Therefore it wraps the Data into a string to avoid the conflict in strucutre.


While reading through Serges answer and your comments I think that using a JSON structure instead of CSV is more functional for what I am looking. It supports the structures I am looking for quite easily.

However I thought the csv.dictWriter would be able to handle some sort of unwrapping of its own "to-string-wrapped" data.

Also sorry for the delay.


Solution: Solution from Serge applied in the code:

#Added Json
import json
# Reading from File

num_centroids = []
mean_ious = []
centroids = []
reader = csv.DictReader(csvfile,fieldnames=["mean_iou",
                                            "num_boxes",
                                            "centroids"])

# Skipping line of the header
next(reader, None)
for row in reader:
    centroids.append(json.loads(row["centroids"]))
    num_centroids.append(row["num_boxes"])
    mean_ious.append(row["mean_iou"])

Your file is not in csv format, it is just a python dictionary. Just read file into a string and use eval statement (dangerous but easy) or write custom Parser, say, break the string onto array, remove comas and brackets, apply np.fromstring then reshape.

Curiosly "[[65.41156005859375, 53.709598541259766], ..." seems like a valid json, so np.array( json.loads ( "[[65.41156005859375, 53.709598541259766], [251.97698974609375, 153.14926147460938]]" )) should result in a ndarray. Mind that tmp_res = is not valid json, so json.load('myfile') will fail

PS. CSV is intended for tabular data only, not multidimential data. If you must you can do double csv with standard csv and split

s = "[[76 ... "
lines = s.split(']], [[')

reader = csv.reader(lines, delimiter=', ')

or use panda from_csv you can define ]], [[ as lineseparator in C mode.

I guess, a better solution is storing the data in valid json (without any assignments). Or you can try to use designated numpy.save numpy.load to store binary data for greater scalability.

For other viable alternatives read

How can I serialize a numpy array while preserving matrix dimensions?

PS. CSV is intended to be used for tabular data, not arbitrary multidimentonal data, so it is just poor choise here. Nevertheless if you must, you can use double csv reader, though it looks ugly

text = "[[6... 
lines = text.split("]], [[")
reader2 = csv.reader(lines, delimiter=', ')
...

or you can tinker with pandas csv reader it even has custom line delimiter. Perhaps some more powerful csv libraries would work better.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM