简体   繁体   中英

Convert a dictionary of lists to two-column csv

I have a dictionary of lists as follows:

{'banana': [1,2],
 'monkey': [5],
 'cow': [1,5,0],
 ...}

I want to write a csv that contains one number and word as follows:

1 | banana
2 | banana
5 | monkey
1 | cow
5 | cow
0 | cow
...

with | as the delimiter.

I tried to convert it to a list of tuples, and write it as follows:

for k, v in dic.items():
    for ID in v: 
        rv.append((ID, k))

with open(index_filename,'wb') as out:
    csv_out=csv.writer(out, delimiter='|')
    csv_out.writerow(['identifier','descriptor'])
    for row in rv:
        csv_out.writerow(row)

but ran this error:

a bytes-like object is required, not 'str'

Is there a more efficient way of doing this than converting to a tuple, and if not, what's wrong with my code?

Thanks.

You are opening the file in binary/bytes mode, which is specified by the "b" in "wb". This is something many people did in the python2 days, when "str" and "bytes" was the same thing, so many older books still teach it this way.

If you open a file in bytes mode, you must write bytes to it, not strings. A str can be converted to bytes with the str.encode() method:

f.write(some_str_variable.encode()

However, what you probably want instead is to not open the file in bytes mode.

with open(index_filename, 'w') as out:
    ...

If you want to make your code more efficient, it is important, that you state with respect to what you want to make it more efficient. Besides terrible solutions, there is often a trade-off between space (memory) and time (cycles, functions calls) among the reasonable solutions.

Aside from efficiency, you should also take readability and maintainability into account. Before doing any kind of optimizations.

Tuples like dicts in Python are very efficient, because they are used internally all over place. Most function calls in Python involve tuple creation (for positional arguments) under the hood.

As to your concrete example, you can use a generator expression to avoid the temporary list:

entries = ((k, v) for k, l in dic.items() for v in l)

You still have the intermediate tuples, but they are computed on the fly, while you iterate over the dictionary items. This solution would be more memory efficient than an explicit list, especially if you have lots of entries.

You could also just put the nested loop directly into the with body:

with open(index_filename,'wb') as out:
    csv_out=csv.writer(out, delimiter='|')
    csv_out.writerow(['identifier','descriptor'])
    for k, v in dic.items():
        for ID in v: 
            csv_out.writerow((k, ID))

To avoid the repeated function calls to writerow , you could also resort to writerows , which might be faster.

with open(index_filename,'wb') as out:
    csv_out=csv.writer(out, delimiter='|')
    csv_out.writerow(['identifier','descriptor'])
    csv_out.writerows((k, v) for k, l in dic.items() for v in l)

If you are really interested in, which method is the fastest, you can use Python's timeit module to make measurements.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM