简体   繁体   中英

Reading a YAML file in python and accessing the data by matching key value pair

I am developing a software using Python, where I need to read a YAML file with multiple levels as shown below:

#Filename: SampleCase.yml
%YAML 1.1
VesselTypes:
  - Name: Escort Tug
    Length: 32
    Breadth: 12.8
    Depth: 9
    Draughts:
    - Name: Draught1
      Mass: 500
      CentreOfGravity: [16.497, 0, 4.32]
    TowingStaples:
    - Name: Staple1
      Position: [0, 0, 0]
    Thrusters:
    - Name: Port Propeller
      Position: [0, -1, 0]
      MaxRPM: 1800
      MaxPower: 2525
    - Name: Stbd Propeller
      Position: [0, 1, 0]
      MaxRPM: 1800
      MaxPower: 2525
  - Name: Ship    
Vessels:
  - Name: Tug
    VesselType: Escort Tug
    Draught: Draught1
    InitialPosition: [0, 0, 0]
    Orientation: [0, 0, 0]
  - Name: Tanker
    VesselType: Ship
    Draught: Draught1
    InitialPosition: [0, 0, 0]
    Orientation: [0, 0, 0]
    Speed: 8  

Here, there are two vessels named Tug and Tanker. They are of two vessel types, "Escort Tug" and "Ship".

#Filename: main.py
import yaml
# Reading YAML data
file_name = 'SampleCase.yml'
with open(file_name, 'r') as f:
    data = yaml.load(f)

print(data["Vessels"][0]["Name"])

I am able to access the stored data using index numbers (eg data["Vessels"][0]["Name"] , but I would like to access them using the matching key. For example, I want to print the MaxRPM value of the Port Propeller of the vessel named "Tug". What is the standard way of doing this in python?

There is not a standard way of doing this, and this is for a large part caused by the fact that the keys of YAML can be complex. This makes path matching methods that work for much simpler formats like JSON unusable.

If your YAML is "tag-less", like yours, it still allows much more complex structures than JSON, but you can implement walking recursively over the collection types of a YAML file (sequence and mapping) fairly easily, and while doing so explicitly match indices resp. keys and/or elements resp. values:

import ruamel.yaml as yaml

def _do_not_care():
    pass

def find_collection(d, key=_do_not_care, value=_do_not_care, results=None):

    def check_key_value(d, k, v, results):
        # print('checking', key, value, k, d[k], results)
        if k == key:
            if value in [_do_not_care, v]:
                results.append(d)
                return
        elif key == _do_not_care and v == value:
            results.append(d)
            return
        if isinstance(v, (dict, list)):
            find_collection(v, key, value, results)

    if results is None:
        results = []
    if isinstance(d, dict):
        for k in d:
            check_key_value(d, k, d[k], results)
    if isinstance(d, list):
        for k, v in enumerate(d):
            check_key_value(d, k, v, results)
    return results

def find_first(d, key=_do_not_care, value=_do_not_care):
    ret_val = find_collection(d, key, value)
    return ret_val[0] if ret_val else {}

def find_value_for_key(d, key):
    return find_first(d, key)[key]

with the above in place you can do:

file_name = 'SampleCase.yml'
with open(file_name, 'r') as f:  
    data = yaml.safe_load(f)
for d in find_collection(data, value='Tug'):
    vessel_type = find_first(data, key='Name', value=d['VesselType'])
    port_propeller = find_first(vessel_type, key='Name', value='Port Propeller')
    print('Tug -> MaxRPM', find_value_for_key(port_propeller, key='MaxRPM'))

this prints (assuming the input is corrected, see point 1. ):

Tug -> MaxRPM 1800

There are a few things to keep in mind:

  1. Your YAML is invalid, as there is no --- separation between the directive and the document. It first three lines should look like:

     %YAML 1.1 --- VesselTypes:

    However it is probably not necessary to specify the directive at all: PyYAML still doesn't support YAML 1.2 after seven years and your YAML doesn't seem to have anything YAML 1.1 specific.

  2. You are using PyYAML's load() without Loader argument, which can be unsafe if you have no control over the input. You should always use safe_load if you can (like with your source).

The above was tested using ruamel.yaml (a superset of PyYAML supporting YAML 1.2 as well as 1.1. Disclaimer: I am the author of that package). I should work as is with PyYAML if you have to stick with that.

Turn your list into a dict in which the keys are the names:

result = {}
for elem in data['Vessels']:
    name = elem.pop('Name')
    result[name] = elem

data['Vessels'] = result

print(data['Tug'])
>> {'VesselType': 'EscortTug ...}

You can pass the YAML output to function, which constructs a dictionary based on your specific searching requirements. The behaviour you describe sounds ad-hoc, I don't think there is anything built-in to use.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM