简体   繁体   中英

Python interpreting a Regex from a yaml config file

So I have a yaml file that I'm using as a config file. I'm trying to do some string matching with regular expressions, but I'm having trouble interpreting the regex from yaml into python. The regex in question looks like this:

regex:
    - [A-Za-z0-9]

And when I try to use the re.match function, I get this error:

Traceback (most recent call last):
  File "./dirpylint.py", line 132, in <module>
    sys.exit(main())
  File "./dirpylint.py", line 32, in main
    LevelScan(level)
  File "./dirpylint.py", line 50, in LevelScan
    regex_match(level)
  File "./dirpylint.py", line 65, in regex_match
    if re.match(expression, item) == None:
  File "/usr/lib/python2.7/re.py", line 137, in match
    return _compile(pattern, flags).match(string)
  File "/usr/lib/python2.7/re.py", line 229, in _compile
    p = _cache.get(cachekey)
TypeError: unhashable type: 'list'

I understand that it's interpreting the regex as a list, but how would I use the regex defined in the yaml file to search for a string?

I did this in my YAML parsing "engine".

In [1]: from StringIO import StringIO
In [2]: import re, yaml
In [3]: yaml.add_constructor('!regexp', lambda l, n: re.compile(l.construct_scalar(n)))
In [4]: yaml.load(StringIO("pattern: !regexp '^(Yes|No)$'"))
Out[4]: {'pattern': re.compile(ur'^(Yes|No)$')}

Also this works if you want to use safe_load and !!python/regexp (similar to ruby's and nodejs' implementations):

In [5]: yaml.SafeLoader.add_constructor(u'tag:yaml.org,2002:python/regexp', lambda l, n: re.compile(l.construct_scalar(n)))
In [6]: yaml.safe_load(StringIO("pattern: !!python/regexp '^(Yes|No)$'"))
Out[6]: {'pattern': re.compile(ur'^(Yes|No)$')}

The problem is the YAML, not the Python.
If you want to store a string value containing literal square brackets in a YAML file, you have to quote it:

regex:
  - '[A-Za-z0-9]'

The use of 'single quotes' means the YAML will not interpret any backslash escape sequences in the regex. eg \\b

Also, note that in this YAML the value of regex is a list containing one string, not a simple string.

You're using two list constructs in your YAML file. When you load the YAML file:

>>> d = yaml.load(open('config.yaml'))

You get this:

>>> d
{'regex': [['A-Za-z0-9']]}

Note that the square brackets in your regular expression are actually disappearing because they are being recognized as list delimiters. You can quote them:

regex: - "[A-Za-z0-9]"

To get this:

>>> yaml.load(open('config.yaml'))
{'regex': ['[A-Za-z0-9]']}

So the regular expression is d['regex'][0] . But you could also just do this in your yaml file:

regex: "[A-Za-z0-9]"

Which gets you:

>>> d = yaml.load(open('config.yaml'))
>>> d
{'regex': '[A-Za-z0-9]'}

So the regular expression can be retrieved with a similar dictionary lookup:

>>> d['regex']
'[A-Za-z0-9]'

...which is arguably much simpler.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM