简体   繁体   English

Job Scheduler - YAML用于编写作业定义?

[英]Job Scheduler - YAML for writing job definition?

In our legacy job scheduling software (built on top of crontab), we are using apache config format ( parser ) for writing a job definition and we are using perl config general to parse the config files. 在我们的遗留作业调度软件(构建在crontab之上),我们使用apache配置格式( 解析器 )来编写作业定义,我们使用perl config general来解析配置文件。 This software is highly customized and have functionalities like running commands in my job after checking if dependency of that command is met or not, rescheduling jobs in case command fails, supporting custom notifications etc. 该软件是高度自定义的,具有在检查是否满足该命令的依赖性之后在我的作业中运行命令,在命令失败时重新安排作业,支持自定义通知等功能。

We are now planning to rewrite this software in python and considering options like YAML instead of apache config to write job definition. 我们现在计划在python中重写这个软件,并考虑像YAML而不是apache config这样的选项来编写作业定义。 Is YAML good for writing such dynamic configurations? YAML是否适合编写这样的动态配置?

Example of job definition (run this job at 2AM daily, check if it is tuesday and not holiday in India, if yes reserve my flight and send notification): 作业定义示例(每天凌晨2点运行此作业,检查是否是星期二而不是假期在印度,如果是,请保留我的航班并发送通知):

// python function to check if it is tuesday
checkIfTuesdayAndNotHoliday()

<job> 
    calendar: indian

        <dependency: arbitrary_python_code: checkIfTuesdayAndNotHoliday()>
        <command>  
            check availability of flight
        </command>

        <success: notify: email: agrawall/>
        <failure: notify: email: ops>
        <command>
            some command to book my flight
        </command>
</job>

<crontab> 0 2 * * * </crontab>

I am struggling to understand what format should I use to define job (YAML, Apache Config, XML, JSON etc). 我很难理解我应该使用什么格式来定义作业(YAML,Apache Config,XML,JSON等)。 Note that this job definition will be converted to job object inside my python script. 请注意,此作业定义将在我的python脚本中转换为作业对象。

Apache config parser in perl that we currently use https://metacpan.org/source/TLINDEN/Config-General-2.63/General.pm#L769 我们目前使用的perl中的Apache配置解析器https://metacpan.org/source/TLINDEN/Config-General-2.63/General.pm#L769

Apache config parser in python we plan to use https://github.com/etingof/apacheconfig python中的Apache配置解析器我们计划使用https://github.com/etingof/apacheconfig

Python based config files have at least been around in the form of distutils ' setup.py in Python 1.6 (ie before 2000). 基于Python的配置文件都至少被周围的形式distutils ' setup.py在Python 1.6(即2000年以前)。 The main disadvantage of using such a format is that it is difficult to update values in the config programmatically. 使用这种格式的主要缺点是难以以编程方式更新配置中的值。 Even if you just want to make some additional utility that analysis these files, you even have to take special care that you can import such a config file without executing code, but also without pulling in all kinds of dependencies via imports. 即使您只想制作一些分析这些文件的附加工具,您甚至必须特别注意可以在不执行代码的情况下导入这样的配置文件,而且不需要通过导入引入所有类型的依赖项。 This can be achieved by using if __name__ == '__main__': or more easily by having only the config information as data structure in a file. 这可以通过使用if __name__ == '__main__':来实现if __name__ == '__main__':或者通过仅将配置信息作为文件中的数据结构来实现。

So if updating the files is never going to be an issue, then you use Python based data-structures and those are quite readable. 因此,如果更新文件永远不会成为问题,那么您使用基于Python的数据结构,并且这些数据结构非常易读。

XML and JSON are not good formats for editing by hand. XML和JSON不是手动编辑的好格式。 XML has to many < and > to easily type without special tools. XML有很多<>可以轻松键入而无需特殊工具。 JSON has so many double quotes it makes things difficult to read, but it also has all kind of problems because JSON doesn't allow trailing comma's in arrays and objects leading people to write objects like: JSON有很多双引号,它使得难以阅读,但它也有各种问题,因为JSON不允许在数组和对象中使用尾随逗号来引导人们编写如下对象:

{ 
    "a": 1
  , "b": 2
}

This prevents you from deleting the last line and forgetting to remove the comma separating the key/value pairs, but IMO readable is different. 这可以防止您删除最后一行并忘记删除分隔键/值对的逗号,但IMO可读是不同的。

YAML on the other hand can be written very readable, but there are some rules that have to be taken into account when editing the files. 另一方面,YAML可以编写得非常易读,但在编辑文件时必须考虑一些规则。 In my answer here I show some basic rules that can be included in a YAML file, which editors need to take into account when editing. 在我的回答中我展示了一些可以包含在YAML文件中的基本规则,编辑在编辑时需要考虑这些规则。 YAML can be read by other languages than Python (what is difficult to do with Python based config files). YAML可以被其他语言读取而不是Python(基于Python的配置文件很难做到)。

You can use YAML tags (and appropriate Python objects associated with these tags), so you don't have to be dependent on interpreting the key from some key-value pair to understand what the value interprets: 您可以使用YAML标记(以及与这些标记关联的相应Python对象),因此您不必依赖于从某个键值对解释键来理解值解释的内容:

- !Job
  calendar: !Calendar indian
  dependency: !Arbitrary_python_code checkIfTuesdayAndNotHoliday()
  command: !CommandTester
     exec: !Exec check availability of flight
     success: !Commands
       - !Notify 
          email: agrawall
       - !Exec some command to book my flight
     failure: !Commands
       - !Notify 
           email: ops

(at the bottom is a partial example implementation of the classes associated with these tags) (在底部是与这些标签关联的类的部分示例实现)

YAML can also be programmatically updated even without the loss of comments, key ordering, tags, when you use ruamel.yaml (disclaimer: I am the author of that package). 当您使用ruamel.yaml时,YAML也可以通过编程方式更新,即使没有丢失注释,密钥排序,标签(免责声明:我是该软件包的作者)。


I have been parameterizing my Python packaging ( I manage over 100 packages, some of which are on PyPI, other only for specific clients ), for quite some time by reading the configuration parameters for my generic setup.py from each of the package's __init__.py file. 我已经参与化了我的Python包装(我管理了100多个包,其中一些是在PyPI上,另一些只针对特定的客户端),很长一段时间从每个包的__init__.py读取我的通用setup.py的配置参数__init__.py文件。 I have experimented with inserting a JSON subset of Python, but eventually developed PON (Python Object Notation) which can be easily parsed by the setup.py without importing the __init__.py file with a small (100 line) extension on the AST literal_eval included in the Python standard library. 我已经尝试插入一个Python的JSON子集,但最终开发了PON (Python Object Notation),它可以很容易地被setup.py解析而无需导入__init__.py文件,并在AST literal_eval包含一个小的(100行)扩展名在Python标准库中。

PON can be used without any library (because it is a subset of the Python datastructures, including dict, list, set, tuple and basic types like integers, floats, booleans, strings, date, datetime. Because it is based on the AST evaluator, you can do calculations ( secs_per_day = 24 * 60 * 60 ) and other evaluations in your configuration file. PON可以在没有任何库的情况下使用(因为它是Python数据结构的子集,包括dict,list,set,tuple和基本类型,如整数,浮点数,布尔值,字符串,日期,日期时间。因为它基于AST评估程序,您可以在配置文件中进行计算( secs_per_day = 24 * 60 * 60 )和其他评估。

The PON readme also has more detailed description of the advantages (and disadvantages) of that format over YAML, JSON, INI, XML. PON自述文件还详细描述了该格式相对于YAML,JSON,INI,XML的优点(和缺点)。

The PON package is not needed to use the configuration data, it is only needed if you want to do programmatic round-trips (load-edit-dump) on the PON data. 使用配置数据不需要PON包,只有在要对PON数据执行编程往返(load-edit-dump)时才需要它。


import sys
from ruamel.yaml import YAML, yaml_object

yaml = YAML()

@yaml_object(yaml)
class CommandTester:
    yaml_tag = u'!CommandTester'

    def __init__(self, exec=None, success=None, failure=None):
        self.exec = exec
        self.success = success
        self.failure = failure

    def __call__(self):
        if self.exec():
            self.success()
        else:
            self.failure()

@yaml_object(yaml)
class Commands:
    """a list of commands"""
    yaml_tag = u'!Commands'

    def __init__(self, commands):
        self._commands = commands  # list of commands to execute

    @classmethod
    def from_yaml(cls, constructor, node):
        for m in yaml.constructor.construct_yaml_seq(node):
            pass
        return cls(m)

    @classmethod
    def to_yaml(cls, representer, node):
        return representer.represent_sequence(cls.yaml_tag, node._commands)

    def __call__(self, verbose=0, stop_on_error=False):
        res = True
        for cmd in self._cmd:
            try:
                res = subprocess.check_output(cmd)
            except Exception as e:
                res = False
                if stop_on_error:
                    break
            return res

@yaml_object(yaml)
class Command(Commands):
    """a single command"""
    yaml_tag = u'!Exec'

    def __init__(self, command):
        Commands.__init__(self, [command])

    @classmethod
    def from_yaml(cls, constructor, node):
        return cls(node.value)

    @classmethod
    def to_yaml(cls, representer, node):
        return representer.represent_scalar(cls.yaml_tag, node._commands[0])


@yaml_object(yaml)
class Notifier:
    yaml_tag = u'!Notify'

with open("job.yaml") as fp:
    job = yaml.load(fp)

yaml.dump(job, sys.stdout)

The new tendency is to use a Python file as config. 新的趋势是使用Python文件作为配置。 This is what is done in Django and Flask. 这是在Django和Flask中完成的。 It is human-readable, easy to define and update, and of course straightforward to convert into Python object. 它是人类可读的,易于定义和更新,当然可以直接转换为Python对象。

See also the accepted answer to “Pros and cons for different configuration formats?” . 另请参阅“不同配置格式的优缺点”的已接受答案。

See also this article “Configuration files in Python” . 另请参阅本文“Python中的配置文件”

Here is an example ( setting.py ): 这是一个例子( setting.py ):

def check_if_tuesday_and_not_holiday():
    """check if it is tuesday and not holiday"""
    return True

JOB = {
    'calendar': 'indian',
    'dependency': {
        'arbitrary_python_code': check_if_tuesday_and_not_holiday  # callback
    },
    'command': 'check availability of flight',
    'success': {
        'notify': {
            'email': 'agrawall'
        },
        'command': 'some command to book my flight'
    },
    'failure': {
        'notify': {
            'email': 'ops'
        }
    }
}

CRONTAB = '0 2 * * *'

note: I'm not sure to understand your configuration file, so I do my best to adapt it to Python... 注意:我不确定你的配置文件是什么,所以我尽力让它适应Python ......

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM