简体   繁体   English

循环并加载 yaml 文件的压缩文件夹

[英]loop through and load a zipped folder of yaml files

I have a zipped folder containing 15 000 yaml files.我有一个包含 15 000 个 yaml 文件的压缩文件夹。 I'd like to iterate through the folder using yaml.safe_load so that each file is in a dictionary format and I can extract information from each file that I need.我想使用 yaml.safe_load 遍历文件夹,以便每个文件都是字典格式,我可以从我需要的每个文件中提取信息。 I've written some code so far using zipfile.ZipFile and yaml.safe_load but it only works for the first file in the zipped folder.到目前为止,我已经使用 zipfile.ZipFile 和 yaml.safe_load 编写了一些代码,但它仅适用于压缩文件夹中的第一个文件。 Would anyone please mind taking a look and explaining what I'm misunderstanding please?请有人介意看看并解释我的误解吗?

zip_file = zipfile.ZipFile("D:/export.zip")
files = zip_file.namelist()
print(files)
for i in range(10):
    with zip_file.open(files[i]) as yamlfile:
        yamlreader = yaml.safe_load(yamlfile)
        print(yamlreader["identifier"]) 

for now I'm just iterating through 10 files to make life easier.现在我只是遍历 10 个文件,让生活更轻松。 Eventually I'd like to do the whole 15 000. "identifier" is a key in the yaml file.最终我想做整个 15 000。“标识符”是 yaml 文件中的一个键。

This is the error:这是错误:

10.5281/zenodo.1014773
Traceback (most recent call last):
  File "C:/Users/estho/PycharmProjects/GSOC3/testing_dataextraction.py", line 20, in <module>
    yamlreader = yaml.safe_load(yamlfile)
  File "C:\Users\estho\PycharmProjects\GSOC3\lib\site-packages\yaml\__init__.py", line 162, in safe_load
    return load(stream, SafeLoader)
  File "C:\Users\estho\PycharmProjects\GSOC3\lib\site-packages\yaml\__init__.py", line 114, in load
    return loader.get_single_data()
  File "C:\Users\estho\PycharmProjects\GSOC3\lib\site-packages\yaml\constructor.py", line 41, in get_single_data
    node = self.get_single_node()
  File "C:\Users\estho\PycharmProjects\GSOC3\lib\site-packages\yaml\composer.py", line 36, in get_single_node
    document = self.compose_document()
  File "C:\Users\estho\PycharmProjects\GSOC3\lib\site-packages\yaml\composer.py", line 55, in compose_document
    node = self.compose_node(None, None)
  File "C:\Users\estho\PycharmProjects\GSOC3\lib\site-packages\yaml\composer.py", line 84, in compose_node
    node = self.compose_mapping_node(anchor)
  File "C:\Users\estho\PycharmProjects\GSOC3\lib\site-packages\yaml\composer.py", line 127, in compose_mapping_node
    while not self.check_event(MappingEndEvent):
  File "C:\Users\estho\PycharmProjects\GSOC3\lib\site-packages\yaml\parser.py", line 98, in check_event
    self.current_event = self.state()
  File "C:\Users\estho\PycharmProjects\GSOC3\lib\site-packages\yaml\parser.py", line 428, in parse_block_mapping_key
    if self.check_token(KeyToken):
  File "C:\Users\estho\PycharmProjects\GSOC3\lib\site-packages\yaml\scanner.py", line 116, in check_token
    self.fetch_more_tokens()
  File "C:\Users\estho\PycharmProjects\GSOC3\lib\site-packages\yaml\scanner.py", line 260, in fetch_more_tokens
    self.get_mark())
yaml.scanner.ScannerError: while scanning for the next token
found character '\t' that cannot start any token
  in "yamlfile_10_5281_zenodo_1745362.yaml", line 4, column 1

Thank you.谢谢你。

It seems to me like in the file "yamlfile_10_5281_zenodo_1745362.yaml" there is a bad token name.在我看来,文件"yamlfile_10_5281_zenodo_1745362.yaml"有一个错误的令牌名称。 Try running it without this file.尝试在没有此文件的情况下运行它。 In python \\t is representative of a tab and so cannot be included in a string ect normally without escaping it.在 python 中, \\t代表一个选项卡,因此如果不转义它,通常不能包含在字符串 ect 中。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM