简体   繁体   English

将列表字典转换为两列csv

[英]Convert a dictionary of lists to two-column csv

I have a dictionary of lists as follows: 我有一个列表字典,如下所示:

{'banana': [1,2],
 'monkey': [5],
 'cow': [1,5,0],
 ...}

I want to write a csv that contains one number and word as follows: 我想写一个包含一个数字和单词的csv,如下所示:

1 | banana
2 | banana
5 | monkey
1 | cow
5 | cow
0 | cow
...

with | 与| as the delimiter. 作为分隔符。

I tried to convert it to a list of tuples, and write it as follows: 我试图将其转换为元组列表,并将其编写如下:

for k, v in dic.items():
    for ID in v: 
        rv.append((ID, k))

with open(index_filename,'wb') as out:
    csv_out=csv.writer(out, delimiter='|')
    csv_out.writerow(['identifier','descriptor'])
    for row in rv:
        csv_out.writerow(row)

but ran this error: 但是跑了这个错误:

a bytes-like object is required, not 'str'

Is there a more efficient way of doing this than converting to a tuple, and if not, what's wrong with my code? 有没有比转换成元组更有效的方法,如果没有,我的代码有什么问题?

Thanks. 谢谢。

You are opening the file in binary/bytes mode, which is specified by the "b" in "wb". 您正在以二进制/字节模式打开文件,该模式由“ wb”中的“ b”指定。 This is something many people did in the python2 days, when "str" and "bytes" was the same thing, so many older books still teach it this way. 这是许多人在python2天内所做的事情,当时“ str”和“ bytes”是同一件事,因此许多旧书仍然以这种方式进行教授。

If you open a file in bytes mode, you must write bytes to it, not strings. 如果以字节模式打开文件,则必须向其写入字节,而不是字符串。 A str can be converted to bytes with the str.encode() method: 可以使用str.encode()方法将str转换为字节:

f.write(some_str_variable.encode()

However, what you probably want instead is to not open the file in bytes mode. 但是,您可能想要的是不要以字节模式打开文件。

with open(index_filename, 'w') as out:
    ...

If you want to make your code more efficient, it is important, that you state with respect to what you want to make it more efficient. 如果要提高代码效率,请务必说明要提高代码效率的方面。 Besides terrible solutions, there is often a trade-off between space (memory) and time (cycles, functions calls) among the reasonable solutions. 除了糟糕的解决方案之外,在合理的解决方案之间,通常还需要在空间(内存)和时间(周期,函数调用)之间进行权衡。

Aside from efficiency, you should also take readability and maintainability into account. 除了效率,您还应该考虑可读性和可维护性。 Before doing any kind of optimizations. 在进行任何类型的优化之前。

Tuples like dicts in Python are very efficient, because they are used internally all over place. Python中的像dict这样的元组非常有效,因为它们在内部各处使用。 Most function calls in Python involve tuple creation (for positional arguments) under the hood. Python中的大多数函数调用都在后台进行元组创建(用于位置参数)。

As to your concrete example, you can use a generator expression to avoid the temporary list: 对于您的具体示例,可以使用生成器表达式来避免使用临时列表:

entries = ((k, v) for k, l in dic.items() for v in l)

You still have the intermediate tuples, but they are computed on the fly, while you iterate over the dictionary items. 您仍然具有中间元组,但是它们是在迭代字典项时动态计算的。 This solution would be more memory efficient than an explicit list, especially if you have lots of entries. 与显式列表相比,此解决方案将更有效地利用内存,特别是如果您有很多条目的话。

You could also just put the nested loop directly into the with body: 您也可以将嵌套循环直接放入with主体中:

with open(index_filename,'wb') as out:
    csv_out=csv.writer(out, delimiter='|')
    csv_out.writerow(['identifier','descriptor'])
    for k, v in dic.items():
        for ID in v: 
            csv_out.writerow((k, ID))

To avoid the repeated function calls to writerow , you could also resort to writerows , which might be faster. 为了避免重复调用writerow函数,您还可以使用writerows ,这可能会更快。

with open(index_filename,'wb') as out:
    csv_out=csv.writer(out, delimiter='|')
    csv_out.writerow(['identifier','descriptor'])
    csv_out.writerows((k, v) for k, l in dic.items() for v in l)

If you are really interested in, which method is the fastest, you can use Python's timeit module to make measurements. 如果您真的对哪种方法最快感兴趣,可以使用Python的timeit模块进行测量。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM