简体   繁体   English

读/写包含任意长度列表的CSV字典数组

[英]Read/write CSV Array of Dicts containing an list of arbitrary length

I am currently writing an array of dictionaries like below to a csv File: 我目前正在将下面的字典数组写入一个csv文件:

tmp_res = [{"val1": 1.0, "val2": 2, "ar_1": [[65.41156005859375, 53.709598541259766], [251.97698974609375, 153.14926147460938]] },....]

ar1 represents an *ndarray* of arbitrary length [-1,2] and -1 is not constant in the Dicts. ar1表示任意长度[-1,2]*ndarray* ,- -1在Dicts中不是恒定的。

After reading I get the single values of val1 and val2 as supposed however the Array is not easily readable. 读取后,我按预期得到了val1val2的单个值,但是该数组不容易读取。

"[[65.41156005859375, 53.709598541259766], [251.97698974609375, 153.14926147460938]]"

I know I could work through that string and seperate it by some characters. 我知道我可以处理该字符串,并用一些字符分隔它。 However it feels like there should be an better and more elegant solution to this.way to solve this problem. 但是,似乎应该对此有一个更好,更优雅的解决方案来解决此问题。

What is the best way to save such Data to a file and restore it? 将此类数据保存到文件并还原的最佳方法是什么?

EDIT: To clarify my saving and reading of the file. 编辑:澄清我的保存和读取文件。 I am saving my file via a csv.DictWriter in the following way: 我通过csv.DictWriter以以下方式保存文件:


# Exemplary Data:
results = [{'mean_iou': 0.3319194248978337, 'num_boxes': 1, 'centroids': [[101.21826171875, 72.79462432861328]]}, {'mean_iou': 0.4617333142965009, 'num_boxes': 2, 'centroids': [[65.41156005859375, 53.709598541259766], [251.97698974609375, 153.14926147460938]]}, {'mean_iou': 0.537150158582514, 'num_boxes': 3, 'centroids': [[50.82071304321289, 42.616580963134766], [304.91583251953125, 176.09994506835938], [140.43699645996094, 104.00206756591797]]}]

# The given results data is basically tmp_res after the for loop.
tmp_res = []
for i in range(0, len(results):
    res_dict = {}
    res_dict["centroids"] = results[i]["centroids"]
    res_dict["mean_iou"] = results[i]["mean_iou"]
    res_dict["num_boxes"] = results[i]["num_boxes"]
    tmp_res.append(res_dict)

# Writing to File
keys = tmp_res[0].keys()
with open('anchor.csv','w+') as output_file:
    dict_writer = csv.DictWriter(output_file, keys)
    dict_writer.writeheader()
    dict_writer.writerows(tmp_res)

# Reading from File

  num_centroids = []
  mean_ious = []
  centroids = []
  reader = csv.DictReader(csvfile,
                          fieldnames=["mean_iou",
                                      "num_boxes",
                                      "centroids"])
        # Skipping line of the header
        next(reader, None)
        for row in reader:
            centroids.append(row["centroids"])
            num_centroids.append(row["num_boxes"])
            mean_ious.append(row["mean_iou"])

An exerpt from the file looks like follows: 该文件的摘录如下所示:

mean_iou,num_boxes,centroids

0.3319194248978337,1,"[[101.21826171875, 72.79462432861328]]"

0.4617333142965009,2,"[[65.41156005859375, 53.709598541259766], [251.97698974609375, 153.14926147460938]]"

0.537150158582514,3,"[[50.82071304321289, 42.616580963134766],  [304.91583251953125, 176.09994506835938], [140.43699645996094, 104.00206756591797]]"

0.5602804262309611,4,"[[49.9361572265625, 41.09553146362305], [306.10711669921875, 177.09762573242188], [88.86656188964844, 167.8087921142578], [151.82627868652344, 81.80717468261719]]"

I suspect that the csv.DictWriter doesn't know how to handle an array of multiple values, since it contains an Comma , which would break the format of the comma seperated values. 我怀疑csv.DictWriter不知道如何处理多个值的数组,因为它包含一个Comma ,这会破坏逗号分隔值的格式。 Therefore it wraps the Data into a string to avoid the conflict in strucutre. 因此,它将数据包装到字符串中,以避免结构冲突。


While reading through Serges answer and your comments I think that using a JSON structure instead of CSV is more functional for what I am looking. 在阅读Serges答案和您的评论时,我认为使用JSON结构而不是CSV可以满足我的需求。 It supports the structures I am looking for quite easily. 它很容易支持我正在寻找的结构。

However I thought the csv.dictWriter would be able to handle some sort of unwrapping of its own "to-string-wrapped" data. 但是我认为csv.dictWriter能够处理某种形式的自身“ to-string-wrapped”数据的解包。

Also sorry for the delay. 也为延迟感到抱歉。


Solution: Solution from Serge applied in the code: 解决方案:来自Serge的解决方案应用于代码中:

#Added Json
import json
# Reading from File

num_centroids = []
mean_ious = []
centroids = []
reader = csv.DictReader(csvfile,fieldnames=["mean_iou",
                                            "num_boxes",
                                            "centroids"])

# Skipping line of the header
next(reader, None)
for row in reader:
    centroids.append(json.loads(row["centroids"]))
    num_centroids.append(row["num_boxes"])
    mean_ious.append(row["mean_iou"])

Your file is not in csv format, it is just a python dictionary. 您的文件不是csv格式,只是python字典。 Just read file into a string and use eval statement (dangerous but easy) or write custom Parser, say, break the string onto array, remove comas and brackets, apply np.fromstring then reshape. 只需将文件读入字符串并使用eval语句(危险但容易)或编写自定义解析器,例如,将字符串分解为数组,除去逗号和括号,应用np.fromstring然后进行整形即可。

Curiosly "[[65.41156005859375, 53.709598541259766], ..." seems like a valid json, so np.array( json.loads ( "[[65.41156005859375, 53.709598541259766], [251.97698974609375, 153.14926147460938]]" )) should result in a ndarray. 好奇地"[[65.41156005859375, 53.709598541259766], ..."看起来像是一个有效的json,所以np.array( json.loads ( "[[65.41156005859375, 53.709598541259766], [251.97698974609375, 153.14926147460938]]" ))应该导致ndarray结果。 Mind that tmp_res = is not valid json, so json.load('myfile') will fail 注意tmp_res =无效的json,因此json.load('myfile')将失败

PS. PS。 CSV is intended for tabular data only, not multidimential data. CSV仅适用于表格数据,不适用于多维数据。 If you must you can do double csv with standard csv and split 如果必须,您可以使用标准csv进行双重csv并拆分

s = "[[76 ... "
lines = s.split(']], [[')

reader = csv.reader(lines, delimiter=', ')

or use panda from_csv you can define ]], [[ as lineseparator in C mode.

I guess, a better solution is storing the data in valid json (without any assignments). 我猜,更好的解决方案是将数据存储在有效的json中(不分配任何内容)。 Or you can try to use designated numpy.save numpy.load to store binary data for greater scalability. 或者,您可以尝试使用指定的numpy.save numpy.load存储二进制数据,以实现更大的可伸缩性。

For other viable alternatives read 有关其他可行的替代方法,请阅读

How can I serialize a numpy array while preserving matrix dimensions? 如何在保留矩阵尺寸的同时序列化一个numpy数组?

PS. PS。 CSV is intended to be used for tabular data, not arbitrary multidimentonal data, so it is just poor choise here. CSV旨在用于表格数据,而不是用于任意的多维数据,因此在这里只是选择不佳。 Nevertheless if you must, you can use double csv reader, though it looks ugly 尽管如此,如果您必须的话,您可以使用双csv阅读器,尽管看起来很丑

text = "[[6... 
lines = text.split("]], [[")
reader2 = csv.reader(lines, delimiter=', ')
...

or you can tinker with pandas csv reader it even has custom line delimiter. 或者您也可以修改熊猫csv阅读器,甚至可以使用自定义行定界符。 Perhaps some more powerful csv libraries would work better. 也许一些更强大的csv库会更好地工作。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM