简体   繁体   English

在 Python 中将 yml 解析为 csv 格式

[英]Parsing yml to csv format in Python

I've got a yml file about million rows.我有一个大约百万行的 yml 文件。 It has the following struccture:它具有以下结构:

    categories:
    - id: "927"
      depth: "1"
      active: "Y"
      name:
        ru: Строительство и ремонт
      is_merged: "1"
      sort: "21"
      properties:
      - id: "77"
        filter: 1
        top: 1
        filter_sort: "500"
        display_type: F
      - id: "79"
        filter: 1
        filter_sort: "500"
        display_type: F
      - id: "8013"
        display_type: F
      - id: "13694"
        filter: 1
        filter_sort: "1"
        expanded: 1
        display_type: F
        values:
        - key: "95574"
          xmlid: ep
          rank: "500"
          value:
            ru: Эп
        - key: "95576"
          xmlid: other
          rank: "1000"
          value:
            ru: Другой

Please help me to convert it to csv format and parse it to file with columns on each steps like:请帮助我将其转换为 csv 格式并将其解析为每个步骤中包含列的文件,例如:
categories | categories | id | id | depth | depth | active | active | name | name | ru | ru | is_merged | is_merged | sort | sort | properties | properties | id | id | filter | filter | top | top | filter_sort | filter_sort | display_type | display_type | values | values | key | key | xmlid | xmlid | rank | rank | value | value | ru

Alright, this was tough.好吧,这很难。

Here is what I have so far.这是我到目前为止所拥有的。

First of all, to get rid of the lists inside your yaml file, I deleted the dashes ("-") by hand.首先,为了摆脱 yaml 文件中的列表,我手动删除了破折号(“-”)。 Because the lists were annoying to get the "key-value" pairs inside the file.因为列表很烦人,无法在文件中获取“键值”对。

Not proud of doing this manually.不以手动执行此操作为荣。 If anyone can write a function to do that, I will happily add to it to my answer.如果有人可以编写一个函数来做到这一点,我会很乐意将它添加到我的答案中。

After you delete your dashes inside the yaml file, you can try the code down below:删除 yaml 文件中的破折号,您可以尝试以下代码:

import csv
import yaml

key_list = []
value_list = []
final_value_list = []

# To get all the items in a nested dictionary
def recursive_items(dictionary):
    for key, value in dictionary.items():
        if type(value) is dict:
            yield (key, value)
            key_list.append(key)
            yield from recursive_items(value)
        else:
            yield (key, value)
            key_list.append(key)

# Load the Data, assuming the file name as "file.yml"
for filename in ['file.yml']:
    with open(filename) as f_input:
        data = yaml.safe_load(f_input)

# Get the pairs
for key, value in recursive_items(data):
    print(key, value)
    value_list.append(value)

# Adjust the value pairs in a new list.
for elem in value_list:
    if type(elem) == str or type(elem) == int:
        final_value_list.append(elem)
    else:
        final_value_list.append("")

# Write the results to a csv file
with open('output.csv', 'wt', newline ='') as data:
    writer = csv.writer(data, delimiter=',')
    writer.writerow(i for i in key_list)
    writer.writerow(j for j in final_value_list)

The result so far looks like this:到目前为止的结果如下所示:

在此处输入图像描述

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM