简体   繁体   English

如何解析包含多个文档的YAML文件?

[英]How to parse a YAML file with multiple documents?

Here is my parsing code: 这是我的解析代码:

import yaml

def yaml_as_python(val):
    """Convert YAML to dict"""
    try:
        return yaml.load_all(val)
    except yaml.YAMLError as exc:
        return exc

with open('circuits-small.yaml','r') as input_file:
    results = yaml_as_python(input_file)
    print results
    for value in results:
         print value

Here is a sample of the file: 以下是该文件的示例:

ingests:
  - timestamp: 1970-01-01T00:00:00.000Z
    id: SwitchBank_35496721
    attrs:
      Feeder: Line_928
      Switch.normalOpen: 'true'
      IdentifiedObject.description: SwitchBank
      IdentifiedObject.mRID: SwitchBank_35496721
      PowerSystemResource.circuit: '928'
      IdentifiedObject.name: SwitchBank_35496721
      IdentifiedObject.aliasName: SwitchBank_35496721
    loc: vector [43.05292, -76.126800000000003, 0.0]
    kind: SwitchBank
  - timestamp: 1970-01-01T00:00:00.000Z
    id: UndergroundDistributionLineSegment_34862802
    attrs:
      Feeder: Line_928
      status: de-energized
      IdentifiedObject.description: UndergroundDistributionLineSegment
      IdentifiedObject.mRID: UndergroundDistributionLineSegment_34862802
      PowerSystemResource.circuit: '928'
      IdentifiedObject.name: UndergroundDistributionLineSegment_34862802
    path:
    - vector [43.052942000000002, -76.126716000000002, 0.0]
    - vector [43.052585000000001, -76.126515999999995, 0.0]
    kind: UndergroundDistributionLineSegment
  - timestamp: 1970-01-01T00:00:00.000Z
    id: UndergroundDistributionLineSegment_34806014
    attrs:
      Feeder: Line_928
      status: de-energized
      IdentifiedObject.description: UndergroundDistributionLineSegment
      IdentifiedObject.mRID: UndergroundDistributionLineSegment_34806014
      PowerSystemResource.circuit: '928'
      IdentifiedObject.name: UndergroundDistributionLineSegment_34806014
    path:
    - vector [43.05292, -76.126800000000003, 0.0]
    - vector [43.052928999999999, -76.126766000000003, 0.0]
    - vector [43.052942000000002, -76.126716000000002, 0.0]
    kind: UndergroundDistributionLineSegment
... 
ingests:
  - timestamp: 1970-01-01T00:00:00.000Z
    id: OverheadDistributionLineSegment_31168454

In the traceback, note that it starts having a problem at the ... 在追溯中,请注意它开始出现问题...

Traceback (most recent call last):
  File "convert.py", line 29, in <module>
    for value in results:
  File "/Users/conduce-laptop/anaconda2/lib/python2.7/site-packages/yaml/__init__.py", line 82, in load_all
    while loader.check_data():
  File "/Users/conduce-laptop/anaconda2/lib/python2.7/site-packages/yaml/constructor.py", line 28, in check_data
    return self.check_node()
  File "/Users/conduce-laptop/anaconda2/lib/python2.7/site-packages/yaml/composer.py", line 18, in check_node
    if self.check_event(StreamStartEvent):
  File "/Users/conduce-laptop/anaconda2/lib/python2.7/site-packages/yaml/parser.py", line 98, in check_event
    self.current_event = self.state()
  File "/Users/conduce-laptop/anaconda2/lib/python2.7/site-packages/yaml/parser.py", line 174, in parse_document_start
    self.peek_token().start_mark)
yaml.parser.ParserError: expected '<document start>', but found '<block mapping start>'
  in "circuits-small.yaml", line 42, column 1

What I would like is for it to parse each of these documents as a separate object, perhaps all of them in the same list, or pretty much anything else that would work with the PyYAML module. 我想要的是它将每个文档解析为一个单独的对象,也许所有这些文档都在同一个列表中,或者几乎任何可以与PyYAML模块一起使用的文件。 I believe the ... is actually valid YAML so I am surprised that it doesn't handle it automatically. 我相信...实际上是有效的YAML所以我很惊讶它不会自动处理它。

The error message is quite specific that a document needs to start with a document start marker . 错误消息非常具体,文档需要以文档开始标记开头 Your first document doesn't have such a marker, although it has a document end marker. 您的第一个文档没有这样的标记,尽管它有一个文档结束标记。 After you explicitly end the first document with ... you can no longer use a document without document boundary markers in PyYAML, you explicitly have to start it with --- : 使用...明确结束第一个文档后,您无法再在PyYAML中使用没有文档边界标记的文档,您必须使用---启动它:

The end of your file should look like: 文件的结尾应如下所示:

    kind: UndergroundDistributionLineSegment
...
---
ingests:
  - timestamp: 1970-01-01T00:00:00.000Z
    id: OverheadDistributionLineSegment_31168454

You can leave out the explicit document start marker from the first document, but you need to include a start marker for every following document. 您可以从第一个文档中省略显式文档开始标记,但是您需要为每个后续文档包含一个开始标记。 Document end markers are optional. 文档结束标记是可选的。

If you don't have complete control over the input, using .load_all() is not safe. 如果您无法完全控制输入,则使用.load_all()是不安全的。 There normally is no reason to take that risk and you should be using .safe_load_all() and extend the SafeLoader to handle any specific tags that your YAML might contain. 通常没有理由承担这种风险,您应该使用.safe_load_all()并扩展SafeLoader以处理YAML可能包含的任何特定标记。

Apart from that you should start your YAML documents with an explicit version directive before the document start indicator (which you should also add to the first document): 除此之外,您应该在文档开始指示符(您还应该添加到第一个文档)之前使用显式版本指令启动YAML文档:

%YAML 1.1
---

This is for the benefit of future editors of your YAML files, because you are using PyYAML, which only supports (most of) YAML 1.1 and not the YAML 1.2 specification (form 2009). 这是为了YAML文件的未来编辑器的好处,因为您使用的是PyYAML,它仅支持(大部分)YAML 1.1而不支持YAML 1.2规范(2009年形式)。 The alternative is of course to upgrade your YAML parser to eg ruamel.yaml , which would also have warned you about your use of the unsafe load_all() (disclaimer: I am the author of that parser). 替代方案当然是将您的YAML解析器升级到例如ruamel.yaml ,它也会警告您使用不安全的load_all() (免责声明:我是该解析器的作者)。 ruamel.yaml doesn't allow you to have a bare document after an explicit end-of-document marker (which is allowed as @flyx pointed out), which is a bug . ruamel.yaml不允许在显式的文档结束标记(允许@flyx指出)之后有一个裸文档,这是一个错误

I think you have an invalid yaml 我认为你有一个无效的yaml

Look at the second document in the sample it begins with a ... instead of --- 看看样本中的第二个文档,它以......开头,而不是---

... 
ingests:
  - timestamp: 1970-01-01T00:00:00.000Z
    id: OverheadDistributionLineSegment_31168454

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM