Output CSV 是二进制 python 3.10

Question

I am trying to create a CSV file from a OSM file.我正在尝试从 OSM 文件创建一个 CSV 文件。 However, every time I run my code I'm getting a "B" in the output, like this,但是，每次我运行我的代码时，我都会在 output 中得到一个“B”，就像这样，

b'id',b'lat',b'lon',b'user',b'uid',b'version',b'changeset',b'timestamp' b'id',b'lat',b'lon',b'user',b'uid',b'version',b'changeset',b'timestamp'

I can NOT figure out what I'm doing wrong.我不知道我做错了什么。

Code is below.代码如下。

def get_element(osm_file, tags=('node', 'way', 'relation')):
    """Yield element if it is the right type of tag"""

    context = ET.iterparse(osm_file, events=('start', 'end'))
    _, root = next(context)
    for event, elem in context:
        if event == 'end' and elem.tag in tags:
            yield elem
            root.clear()


def validate_element(element, validator, schema=SCHEMA):
    """Raise ValidationError if element does not match schema"""
    if validator.validate(element, schema) is not True:
        field, errors = next(validator.errors.iteritems())
        message_string = "\nElement of type '{0}' has the following errors:\n{1}"
        error_string = pprint.pformat(errors)

        raise Exception(message_string.format(field, error_string))


class UnicodeDictWriter(csv.DictWriter, object):
    """Extend csv.DictWriter to handle Unicode input"""

    def writerow(self, row):
        super(UnicodeDictWriter, self).writerow({
            k: (v.encode('utf-8') if isinstance(v, str) else v) for k, v in row.items()
        })

    def writerows(self, rows):
        for row in rows:
            self.writerow(row)


# ================================================== #
#               Main Function                        #
# ================================================== #
def process_map(file_in, validate):
    """Iteratively process each XML element and write to csv(s)"""

    with codecs.open(NODES_PATH, 'w') as nodes_file, \
            codecs.open(NODE_TAGS_PATH, 'w') as nodes_tags_file, \
            codecs.open(WAYS_PATH, 'w') as ways_file, \
            codecs.open(WAY_NODES_PATH, 'w') as way_nodes_file, \
            codecs.open(WAY_TAGS_PATH, 'w') as way_tags_file:

        nodes_writer = UnicodeDictWriter(nodes_file, NODE_FIELDS)
        node_tags_writer = UnicodeDictWriter(nodes_tags_file, NODE_TAGS_FIELDS)
        ways_writer = UnicodeDictWriter(ways_file, WAY_FIELDS)
        way_nodes_writer = UnicodeDictWriter(way_nodes_file, WAY_NODES_FIELDS)
        way_tags_writer = UnicodeDictWriter(way_tags_file, WAY_TAGS_FIELDS)

        nodes_writer.writeheader()
        node_tags_writer.writeheader()
        ways_writer.writeheader()
        way_nodes_writer.writeheader()
        way_tags_writer.writeheader()

        validator = cerberus.Validator()

        for element in get_element(file_in, tags=('node', 'way')):
            el = shape_element(element)
            if el:
                if validate is True:
                    validate_element(el, validator)

                if element.tag == 'node':
                    nodes_writer.writerow(el['node'])
                    node_tags_writer.writerows(el['node_tags'])
                elif element.tag == 'way':
                    ways_writer.writerow(el['way'])
                    way_nodes_writer.writerows(el['way_nodes'])
                    way_tags_writer.writerows(el['way_tags'])

Answer 1

Because of this line in UnicodeDictWriter.writerow() :由于UnicodeDictWriter.writerow()中的这一行：

k: (v.encode('utf-8') if isinstance(v, str) else v) for k, v in row.items()
#   ^^^^^^^^^^^^^^^^^ specifically, this

You are encoding the values in the UTF-8 codec, which outputs bytes objects.您正在对 UTF-8 编解码器中的值进行编码，该编解码器输出字节对象。 These then get written to your CSV file.然后将这些写入您的 CSV 文件。

As Mark Tolonen pointed out , strings in Python 3 are already Unicode, so there's no need to subclass csv.DictWriter for this purpose.正如 Mark Tolonen 指出的那样，Python 3 中的字符串已经是 Unicode，因此无需为此目的子类csv.DictWriter 。

Output CSV 是二进制 python 3.10

问题描述

1 个解决方案

解决方案1
1 2022-04-02 17:17:22

Output CSV 是二进制 python 3.10

问题描述

1 个解决方案

解决方案1 1 2022-04-02 17:17:22

解决方案1
1 2022-04-02 17:17:22