简体   繁体   中英

How to replace many identical values in a YAML file

I am currently building a python application that uses YAML configs. I generate the YAML config file by using other YAML files. I have a "template" YAML, which defines the basic structure I want in the YAML file the app uses, and then many different "data" YAMLs that fill in the template to spin the application's behavior a certain way. So for example say I had 10 "data" YAMLs. Depending on where the app is being deployed, 1 "data" YAML is chosen, and used to fill out the "template" YAML. The resulting filled out YAML is what the application uses to run. This saves me a ton of work. I have run into a problem with this method though. Say I have a template YAML that looks like this:

id: {{id}}
endpoints:
  url1: https://website.com/{{id}}/search
  url2: https://website.com/foo/{{id}}/get_thing
  url3: https://website.com/hello/world/{{id}}/trigger_stuff
foo:
  bar:
    deeply:
      nested: {{id}}

Then somewhere else, I have like 10 "data" YAMLs each with a different value for {{id}}. I cant seem to figure out an efficient way to replace all these {{id}} occurrences in the template. I am having a problem because sometimes the value to be substituted is a substring of a value I want to mostly keep, or the occurrences are very far apart from each other in the hierarchy, making looping solutions inefficient. My current method for generating the config file using template+data looks something like this in python:

import yaml
import os

template_yaml = os.path.abspath(os.path.join(os.path.dirname(__file__), 'template.yaml'))
# In this same folder you would find flavor2, flavor3, flavor4, etc, lets just use 1 for now
data_yaml = os.path.abspath(os.path.join(os.path.dirname(__file__), 'data_files', 'flavor1.yaml'))
# This is where we dump the filled out template the app will actually use
output_directory = os.path.abspath(os.path.join(os.path.dirname(__file__), os.path.pardir))

with open(template_yaml, 'r') as template:
    try:
        loaded_template = yaml.load(template)  # Load the template as a dict
        with open(data_yaml , 'r') as data:
            loaded_data= yaml.load(data)  # Load the data as a dict
        # From this point on I am basically just setting individual keys from "loaded_template" to values in "loaded_data"
        # But 1 at a time, which is what I am trying to avoid:
        loaded_template['id'] = loaded_data['id']
        loaded_template['endpoints']['url1'] = loaded_template['endpoints']['url1'].format(loaded_data['id'])
        loaded_template['foo']['bar']['deeply']['nested'] = loaded_data['id']

Any idea on how to go through and change all the {{id}} occurrences faster?

If the id is the same in every location for a single yaml file, then you could just read in the template as plain text and use string replacement line by line.

new_file = []

# New id for replacement (from loaded file)
id_ = '123'

# Open template file 
with open('template.yaml', 'r') as f:
    # Iterate through each line
    for l in f:
        # Replace every {{id}} occurrence
        new_file.append(l.replace('{{id}}', id_))

# Save the new file
with open('new_file.yaml', 'w') as f:
    for l in new_file:
        f.write(l)

This will replace {{id}} with the same id_ everywhere in the file and will not change any of the formatting.

YAML has built in "anchors" that you can make and reference kind of like variables. It wasn't obvious to me that these are actually substituting their values where referenced because you only see the result AFTER you parse a YAML. Code is shamelessly stolen from a Reddit post covering a similar topic:

# example.yaml
params: &params
  PARAM1: &P1 5
  PARAM2: &P2 "five"
  PARAM3: &P3 [*P1, *P2]

data:
  <<: *params
  more:
    - *P3
    - *P2

ff

# yaml.load(example) =>
{
'params': {
    'PARAM1': 5, 
    'PARAM2': 'five', 
    'PARAM3': [5, 'five']
},
'data': {
    'PARAM1': 5,
    'PARAM2': 'five',
    'PARAM3': [5, 'five'],
    'more': [[5, 'five'], 'five']
}
}

And this post here on SO is how I think you can use anchors as a substring (assuming you are using python)

You are proposing to us PyYAML, but it is not very suited for doing updates on YAML files. In that process, if it can load your file in the first place, you loose your mapping key order, any comments you have in the file, merges get expanded, and any special anchor names get lost in translation. Apart from that PyYAML cannot deal with the latest YAML spec (released 9 years ago), and it can only handle simple mapping keys.

There are two main solutions:

  • You can use substitution on the raw file
  • You an use ruamel.yaml and recurse into the data structure

Substitution

If you use substition you can do that in much more efficient way than the line by line substittution that @caseWestern proposes. But most of all, you should harden the scalars in which these substitutions take place. Currently you have plain scalars (ie flow style scalars without quotes) and those tend to break if you insert things like # , : and other syntactically significant elements.

In order to prevent that from happening rewrite your input file to use block style literal scalars:

id: {{id}}
endpoints:
  url1: |-
    https://website.com/{{id}}/search
  url2: |-
    https://website.com/foo/{{id}}/get_thing
  url3: |-
    https://website.com/hello/world/{{id}}/trigger_stuff
foo:
  bar:
    deeply:
      nested: |-
        {{id}}

If the above is in alt.yaml you can do:

val = 'xyz'

with open('alt.yaml') as ifp:
    with open('new.yaml', 'w') as ofp:
       ofp.write(ifp.read().replace('{{id}}', val))

to get:

id: xyz
endpoints:
  url1: |-
    https://website.com/xyz/search
  url2: |-
    https://website.com/foo/xyz/get_thing
  url3: |-
    https://website.com/hello/world/xyz/trigger_stuff
foo:
  bar:
    deeply:
      nested: |-
        xyz

ruamel.yaml

Using ruamel.yaml (disclaimer: I am the author of that package), you don't have to worry about breaking the input by syntactically significant replacement texts. If you do so, then the output will automatically be correctly quoted. You do have to take care that your input is valid YAML, and by using something like {{ that, at the beginning of a node indicates two nested flow-style mappings, you'll run into trouble.

The big advantage here is that your input file is loaded, and it is checked to be correct YAML. But this is significantly slower than file level substitution.

So if your input is in.yaml :

id: <<id>>  # has to be unique
endpoints: &EP
  url1: https://website.com/<<id>>/search
  url2: https://website.com/foo/<<id>>/get_thing
  url3: https://website.com/hello/world/<<id>>/trigger_stuff
foo:
  bar:
    deeply:
      nested: <<id>>
    endpoints: *EP
    [octal, hex]: 0o123, 0x1F

You can do:

import sys
import ruamel.yaml

def recurse(d, pat, rep):
    if isinstance(d, dict):
        for k in d:
            if isinstance(d[k], str):
                d[k] = d[k].replace(pat, rep)
            else:
               recurse(d[k], pat, rep)
    if isinstance(d, list):
        for idx, elem in enumerate(d):
            if isinstance(elem, str):
                d[idx] = elem.replace(pat, rep)
            else:
               recurse(d[idx], pat, rep)


yaml = ruamel.yaml.YAML()
yaml.preserve_quotes = True
with open('in.yaml') as fp:
    data = yaml.load(fp)
recurse(data, '<<id>>', 'xy: z')  # not that this makes much sense, but it proves a point
yaml.dump(data, sys.stdout)

which gives:

id: 'xy: z' # has to be unique
endpoints: &EP
  url1: 'https://website.com/xy: z/search'
  url2: 'https://website.com/foo/xy: z/get_thing'
  url3: 'https://website.com/hello/world/xy: z/trigger_stuff'
foo:
  bar:
    deeply:
      nested: 'xy: z'
    endpoints: *EP
    [octal, hex]: 0o123, 0x1F

Please note:

  • The values that have the replacement pattern, are automatically quoted on dump, to deal with the : + space that would otherwise indicate a mapping and break the YAML

  • the YAML.load() method, contrary to PyYAML's load function, is safe (ie cannot execute arbitrary Python by manipulating the input file.

  • The comment, the octal and hexadecimal integer and the alias name is preserved.

  • PyYAML cannot load the file in.yaml at all, although it is valid YAML

  • The above recurse , only changes the input mapping values, if you want to do the keys as well, you either have to pop and reinsert all the keys (even if not changed), to keep the original order, or you need to use enumerate and d.insert(position, key, value) . If you have merges, you also cannot just walk over the keys, you'll have to walk over the non-merged keys of the "dict".

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM