[英]Need a script that extracts from a yaml file content and output as a csv file
I'm very new to python but I would appreciate your help in guiding me in creating a simple script that reads through a bunch of .yaml files (about 300 files in the same directory) and extracts a certain section (electives only) from the .yaml file and converts it into a csv. 我是python的新手,但很感谢您的指导,帮助我创建了一个简单的脚本,该脚本读取一堆.yaml文件(同一目录中约有300个文件),并从中提取了某个部分(仅限选修科目) .yaml文件并将其转换为csv。
An example of what is in the .yaml file .yaml文件中的内容的一个示例
code: 9313
degrees:
- name: Design
coreCourses:
- ABCD1
- ABCD2
- ABCD3
electiveGroups: #this is the section i need to extract
- label: Electives
options:
- Studio1
- Studio2
- Studio3
- label: OtherElectives
options:
- Class1
- Development2
- lateclass1
specialisations:
- label: Honours
How I would like to see the output in csv: 我想如何查看csv中的输出:
.yaml file name | Electives | Studio1
.yaml file name | Electives | Studio2
.yaml file name | Electives | Studio3
.yaml file name | OtherElectives | class1
.yaml file name | OtherElectives | Development2
.yaml file name | OtherElectives | lateclass1
I'm assuming this will be a relatively simple script to write - but i'm looking for some help in writing this up. 我假设这将是一个相对简单的脚本,但是我正在寻找一些帮助来编写此脚本。 I'm very new at this so please be patient. 我对此很陌生,所以请耐心等待。 I have written a few vba macros so i'm hoping I can catch on relatively quickly. 我已经写了一些vba宏,所以我希望我可以相对较快地掌握。
The best would be a complete solution with some guidance as to how the code is working. 最好的办法是提供完整的解决方案,并提供有关代码工作方式的一些指导。
Thanks for all your help in advance. 感谢您提前提供的所有帮助。 I hope my problem is clear 我希望我的问题很清楚
This is my first attempt (albeit spent not to long on it): 这是我的第一次尝试(尽管花了很长时间):
import yaml
with open ('program_4803','r') as f:
doc = yaml.load(f)
txt=doc["electiveGroups"]["options"]
file = open(“test.txt”,”w”)
file.write(“txt”)
file.close()
This is very incomplete at the moment as you can probably tell - but i'm trying to my hardest! 您可能会说,目前这还很不完整-但我正在尽我最大的努力!
For parsing yaml files, use the python yaml library 要解析yaml文件,请使用python yaml库
Example here: Parsing a YAML file in Python, and accessing the data? 此处的示例: 在Python中解析YAML文件并访问数据?
For writing to a file, you do not need csv library 要写入文件,您不需要csv库
file = open(“testfile.txt”,”w”)
file.write(“Hello World”)
file.close()
The above code will write to a file and you can just iterate the result of yaml parsing and write the output to the file accordingly. 上面的代码将写入文件,您可以仅迭代yaml解析的结果,然后将输出相应地写入文件。
This might help: 这可能会有所帮助:
import yaml
import csv
yaml_file_names = ['data.yaml', 'data2.yaml']
rows_to_write = []
for idx, each_yaml_file in enumerate(yaml_file_names):
print("Processing file ", idx+1, "of", len(yaml_file_names), "file name:", each_yaml_file)
with open(each_yaml_file) as f:
data = yaml.load(f)
for each_dict in data['degrees']:
for each_nested_dict in each_dict['electiveGroups']:
for each_option in each_nested_dict['options']:
# write to csv yaml_file_name, each_nested_dict['label'], each_option
rows_to_write.append([each_yaml_file, each_nested_dict['label'], each_option])
with open('output_csv_file.csv', 'w') as out:
csv_writer = csv.writer(out, delimiter='|')
csv_writer.writerows(rows_to_write)
print("Output file output_csv_file.csv created")
Tested this code with two mock input yaml's data.yaml
and data2.yaml
, whose contents were these: 使用两个模拟输入yaml的data.yaml
和data2.yaml
测试了此代码,其内容如下:
data.yaml
: data.yaml
:
code: 9313
degrees:
- name: Design
coreCourses:
- ABCD1
- ABCD2
- ABCD3
electiveGroups: #this is the section i need to extract
- label: Electives
options:
- Studio1
- Studio2
- Studio3
- label: OtherElectives
options:
- Class1
- Development2
- lateclass1
specialisations:
- label: Honours
and data2.yaml
: 和data2.yaml
:
code: 9313
degrees:
- name: Design
coreCourses:
- ABCD1
- ABCD2
- ABCD3
electiveGroups: #this is the section i need to extract
- label: Electives
options:
- Studio1
- label: E2
options:
- Class1
specialisations:
- label: Honours
and the output csv file generated was this: 并且生成的输出csv文件是这样的:
data.yaml|Electives|Studio1
data.yaml|Electives|Studio2
data.yaml|Electives|Studio3
data.yaml|OtherElectives|Class1
data.yaml|OtherElectives|Development2
data.yaml|OtherElectives|lateclass1
data2.yaml|Electives|Studio1
data2.yaml|E2|Class1
and btw, the yaml input that you gave along with your question, it's last 2 lines were not properly indented 顺便说一句,您输入的Yaml输入以及您的问题,最后两行未正确缩进
And as you said that you needed to parse 300 yaml files in a directory, well, you can use glob
module of python for that, like this: 正如您所说的那样,您需要解析目录中的300个yaml文件,那么您可以使用python的glob
模块,如下所示:
import yaml
import csv
import glob
yaml_file_names = glob.glob('./*.yaml')
# yaml_file_names = ['data.yaml', 'data2.yaml']
rows_to_write = []
for idx, each_yaml_file in enumerate(yaml_file_names):
print("Processing file ", idx+1, "of", len(yaml_file_names), "file name:", each_yaml_file)
with open(each_yaml_file) as f:
data = yaml.load(f)
for each_dict in data['degrees']:
for each_nested_dict in each_dict['electiveGroups']:
for each_option in each_nested_dict['options']:
# write to csv yaml_file_name, each_nested_dict['label'], each_option
rows_to_write.append([each_yaml_file, each_nested_dict['label'], each_option])
with open('output_csv_file.csv', 'w') as out:
csv_writer = csv.writer(out, delimiter='|', quotechar=' ')
csv_writer.writerows(rows_to_write)
print("Output file output_csv_file.csv created")
Edit : as you asked in comments for skipping those yaml
files where there is no electiveGroup
section, here is the updated program: 编辑 :如您在注释中所要求的,以跳过那些没有electiveGroup
部分的yaml
文件,这是更新的程序:
import yaml
import csv
import glob
yaml_file_names = glob.glob('./*.yaml')
# yaml_file_names = ['data.yaml', 'data2.yaml']
rows_to_write = []
for idx, each_yaml_file in enumerate(yaml_file_names):
print("Processing file ", idx+1, "of", len(yaml_file_names), "file name:", each_yaml_file)
with open(each_yaml_file) as f:
data = yaml.load(f)
for each_dict in data['degrees']:
try:
for each_nested_dict in each_dict['electiveGroups']:
for each_option in each_nested_dict['options']:
# write to csv yaml_file_name, each_nested_dict['label'], each_option
rows_to_write.append([each_yaml_file, each_nested_dict['label'], each_option])
except KeyError:
print("No electiveGroups or options key found in", each_yaml_file)
with open('output_csv_file.csv', 'w') as out:
csv_writer = csv.writer(out, delimiter='|', quotechar=' ')
csv_writer.writerows(rows_to_write)
print("Output file output_csv_file.csv created")
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.