简体   繁体   中英

python - writing hex digits to csv

I am having a the following string:

>>> line = '\x00\t\x007\x00\t\x00C\x00a\x00r\x00d\x00i\x00o\x00 \x00M\x00e\x00t\x00a\x00b\x00o\x00l\x00i\x00c\x00 \x00C\x00a\x00r\x00e\x00\t\x00\t\x00\t\x00\t\x00 \x001\x002\x00,\x007\x008\x008\x00,\x005\x002\x008\x00.\x000\x004\x00\r\x00\n'

When I type the variable line in the python terminal it showing the following:

>>> line
'\x00\t\x007\x00\t\x00C\x00a\x00r\x00d\x00i\x00o\x00 \x00M\x00e\x00t\x00a\x00b\x00o\x00l\x00i\x00c\x00 \x00C\x00a\x00r\x00e\x00\t\x00\t\x00\t\x00\t\x00 \x001\x002\x00,\x007\x008\x008\x00,\x005\x002\x008\x00.\x000\x004\x00\r\x00\n'

When I am printing it, its showing the following:

>>> print line
        7    Cardio Metabolic Care               12,788,528.04

In the variable line each word is separated using \\t and I wanted to save it to a csv file. So I tried using the following code:

import csv
with open('test.csv', 'wb') as csvfile:
    spamwriter = csv.writer(csvfile, delimiter=',')
    spamwriter.writerow(line.split('\t'))

When I look into the test.csv file, I am getting only the following

,,,,,,

Is there any to get the words into the csv file. Kindly help.

Your input text is not corrupted, it's encoded - as UTF-16 (Big Endian in this case). And it's CSV itself, just with tab as the delimiter.

You must decode it into a string, after that you can use it normally.

Ideally you declare the proper byte encoding when you read it from a source. For example, when you open a file you can state the encoding the file uses so that the file reader will decode the contents for you.

If you have that byte string from a source where you can't declare an encoding while reading it, you can decode manually:

line = '\x00\t\x007\x00\t\x00C\x00a\x00r\x00d\x00i\x00o\x00 \x00M\x00e\x00t\x00a\x00b\x00o\x00l\x00i\x00c\x00 \x00C\x00a\x00r\x00e\x00\t\x00\t\x00\t\x00\t\x00 \x001\x002\x00,\x007\x008\x008\x00,\x005\x002\x008\x00.\x000\x004\x00\r\x00\n'
decoded = line.decode('utf_16_be')

print decoded
#   7   Cardio Metabolic Care                12,788,528.04

But since I suppose that you are actually reading it from a file:

import csv
import codecs

with codecs.open('input.txt', 'r', encoding='utf16') as in_file, codecs.open('output.csv', 'w', encoding='utf8') as out_file:
    reader = csv.reader(in_file, delimiter='\t')
    writer = csv.writer(out_file, delimiter=',', quotechar='"')

    writer.writerows(reader)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM