简体   繁体   中英

How can I parse a YAML file in Python

如何在 Python 中解析 YAML 文件?

The easiest and purest method without relying on C headers is PyYaml ( documentation ), which can be installed via pip install pyyaml :

#!/usr/bin/env python

import yaml

with open("example.yaml", 'r') as stream:
    try:
        print(yaml.safe_load(stream))
    except yaml.YAMLError as exc:
        print(exc)

And that's it. A plain yaml.load() function also exists, but yaml.safe_load() should always be preferred unless you explicitly need the arbitrary object serialization/deserialization provided in order to avoid introducing the possibility for arbitrary code execution.

Note the PyYaml project supports versions up through the YAML 1.1 specification . If YAML 1.2 specification support is needed, see ruamel.yaml as noted in this answer .

Read & Write YAML files with Python 2+3 (and unicode)

# -*- coding: utf-8 -*-
import yaml
import io

# Define data
data = {
    'a list': [
        1, 
        42, 
        3.141, 
        1337, 
        'help', 
        u'€'
    ],
    'a string': 'bla',
    'another dict': {
        'foo': 'bar',
        'key': 'value',
        'the answer': 42
    }
}

# Write YAML file
with io.open('data.yaml', 'w', encoding='utf8') as outfile:
    yaml.dump(data, outfile, default_flow_style=False, allow_unicode=True)

# Read YAML file
with open("data.yaml", 'r') as stream:
    data_loaded = yaml.safe_load(stream)

print(data == data_loaded)

Created YAML file

a list:
- 1
- 42
- 3.141
- 1337
- help
- €
a string: bla
another dict:
  foo: bar
  key: value
  the answer: 42

Common file endings

.yml and .yaml

Alternatives

For your application, the following might be important:

  • Support by other programming languages
  • Reading / writing performance
  • Compactness (file size)

See also: Comparison of data serialization formats

In case you are rather looking for a way to make configuration files, you might want to read my short article Configuration files in Python

If you have YAML that conforms to the YAML 1.2 specification (released 2009) then you should use ruamel.yaml (disclaimer: I am the author of that package). It is essentially a superset of PyYAML, which supports most of YAML 1.1 (from 2005).

If you want to be able to preserve your comments when round-tripping, you certainly should use ruamel.yaml.

Upgrading @Jon's example is easy:

import ruamel.yaml as yaml

with open("example.yaml") as stream:
    try:
        print(yaml.safe_load(stream))
    except yaml.YAMLError as exc:
        print(exc)

Use safe_load() unless you really have full control over the input, need it (seldom the case) and know what you are doing.

If you are using pathlib Path for manipulating files, you are better of using the new API ruamel.yaml provides:

from ruamel.yaml import YAML
from pathlib import Path

path = Path('example.yaml')
yaml = YAML(typ='safe')
data = yaml.load(path)

First install pyyaml using pip3.

Then import yaml module and load the file into a dictionary called 'my_dict':

import yaml
with open('filename.yaml') as f:
    my_dict = yaml.safe_load(f)

That's all you need. Now the entire yaml file is in 'my_dict' dictionary.

Example:


defaults.yaml

url: https://www.google.com

environment.py

from ruamel import yaml

data = yaml.safe_load(open('defaults.yaml'))
data['url']

To access any element of a list in a YAML file like this:

global:
  registry:
    url: dtr-:5000/
    repoPath:
  dbConnectionString: jdbc:oracle:thin:@x.x.x.x:1521:abcd

You can use following python script:

import yaml

with open("/some/path/to/yaml.file", 'r') as f:
    valuesYaml = yaml.load(f, Loader=yaml.FullLoader)

print(valuesYaml['global']['dbConnectionString'])

I use ruamel.yaml . Details & debatehere .

from ruamel import yaml

with open(filename, 'r') as fp:
    read_data = yaml.load(fp)

Usage of ruamel.yaml is compatible (with some simple solvable problems) with old usages of PyYAML and as it is stated in link I provided, use

from ruamel import yaml

instead of

import yaml

and it will fix most of your problems.

EDIT : PyYAML is not dead as it turns out, it's just maintained in a different place.

#!/usr/bin/env python

import sys
import yaml

def main(argv):

    with open(argv[0]) as stream:
        try:
            #print(yaml.load(stream))
            return 0
        except yaml.YAMLError as exc:
            print(exc)
            return 1

if __name__ == "__main__":
    sys.exit(main(sys.argv[1:]))

read_yaml_file function returning all data into dictionary.

def read_yaml_file(full_path=None, relative_path=None):
   if relative_path is not None:
        resource_file_location_local = ProjectPaths.get_project_root_path() + relative_path
else:
    resource_file_location_local = full_path

with open(resource_file_location_local, 'r') as stream:
    try:
        file_artifacts = yaml.safe_load(stream)
    except yaml.YAMLError as exc:
        print(exc)
return dict(file_artifacts.items())

Considering the above mentioned answers, all of which are good, there is a Python package available to smartly construct objects from YAML/JSON/dicts, and is actively being developed and expanded. ( full disclosure, I am a co-author of this package , see here )

Install:

pip install pickle-rick

Use:

Define a YAML or JSON string (or file).

BASIC:
 text: test
 dictionary:
   one: 1
   two: 2
 number: 2
 list:
   - one
   - two
   - four
   - name: John
     age: 20
 USERNAME:
   type: env
   load: USERNAME
 callable_lambda:
   type: lambda
   load: "lambda: print('hell world!')"
 datenow:
   type: lambda
   import:
     - "from datetime import datetime as dd"
   load: "lambda: print(dd.utcnow().strftime('%Y-%m-%d'))"
 test_function:
   type: function
   name: test_function
   args:
     x: 7
     y: null
     s: hello world
     any:
       - 1
       - hello
   import:
     - "math"
   load: >
     def test(x, y, s, any):
       print(math.e)
       iii = 111
       print(iii)
       print(x,s)
       if y:
         print(type(y))
       else:
         print(y)
       for i in any:
         print(i)

Then use it as an object.

>> from pickle_rick import PickleRick

>> config = PickleRick('./config.yaml', deep=True, load_lambda=True)

>> config.BASIC.dictionary
{'one' : 1, 'two' : 2}

>> config.BASIC.callable_lambda()
hell world!

You can define Python functions, load additional data from other files or REST APIs, environmental variables, and then write everything out to YAML or JSON again.

This works especially well when building systems that require structured configuration files, or in notebooks as interactive structures.

There is a security note to using this. Only load files that are trusted, as any code can be executed, thus stay clear of just loading anything without knowing what the complete contents are.

The package is called PickleRick and is available here:

Suggestion: Use yq

I'm Not sure how it wasn't suggested before, but I would highly recommend using yq which is a jq wrapper for YAML.

yq uses jq like syntax but works with yaml files as well as json.


Examples:

1 ) Read a value:

yq e '.a.b[0].c' file.yaml

2 ) Pipe from STDIN:

cat file.yaml | yq e '.a.b[0].c' -

3 ) Update a yaml file, inplace

yq e -i '.a.b[0].c = "cool"' file.yaml

4 ) Update using environment variables:

NAME=mike yq e -i '.a.b[0].c = strenv(NAME)' file.yaml

5 ) Merge multiple files:

yq ea '. as $item ireduce ({}; . * $item )' path/to/*.yml

6 ) Multiple updates to a yaml file:

yq e -i '
  .a.b[0].c = "cool" |
  .x.y.z = "foobar" |
  .person.name = strenv(NAME)
' file.yaml

(*) Read more on how to parse fields from yaml with based on jq filters .


Additional references:

https://github.com/mikefarah/yq/#install

https://github.com/kislyuk/yq

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM