简体   繁体   English

如何使用正则表达式从多行字符串获取groupdict

[英]How to get groupdict from multi-line string using regex

I tried to get a dictionary from multi-line string using regex, but I have a problem with proper separation of lines. 我试图使用正则表达式从多行字符串中获取字典,但是行正确分隔存在问题。

Here is what I have tried... 这是我尝试过的...

import re

text = '''\n\n\nName: Clash1\nDistance: -1.274m\nImage Location: navis_raport_txt_files\\cd000001.jpg\nHardStatus: New\nClash Point: 1585.236m, 193.413m'''
clash_data = re.compile('''
    (?P<clash_number>Clash\d+)\n
    (?P<clash_depth>\d.\d{3})\n
    (?P<image_location>cd\d+.jpg)\n
    (?P<clash_status>\w{2:})\n
    (?P<clash_point>.*)\n
    (?P<clash_grid>\w+-\d+)\n
    (?P<clash_date>.*)''', re.I | re.VERBOSE)
print(clash_data.search(text).groupdict())

This similar example works well: 这个类似的例子很好用:

import re

MHP = ['''MHP-PW-K_SZ-117-R01-UZ-01 - drawing title 123''',
       'MHP-PW-K_SZ-127-R01WIP - drawing title 2',
       'MHP-PW-K_SZ-107-R03-UZ-1 - drawing title 3']

fields_from_name = re.compile('''
    (?P<object>\w{3})[-_]
    (?P<phase>\w{2})[-_]
    (?P<field>\w)[-_]
    (?P<type>\w{2})[-_]
    (?P<dr_number>\d{3})[-_]
    [-_]?
    (?P<revision>\w\d{2})?
    (?P<wip_status>WIP)?
    [-_]?
    (?P<suplement>UZ-\d+)?
    [\s-]+
    (?P<drawing_title>.*)
    ''', re.IGNORECASE | re.VERBOSE)
for name in MHP:
    print(fields_from_name.search(name).groupdict())

Why doesn't my attempt work like the example? 为什么我的尝试不能像示例一样工作?

It is not working simply because Pattern.search() is not finding a match. 它不仅仅因为Pattern.search()未找到匹配项而起作用。 Based on the working example you are mimicking, you need to also match the characters between the named capture groups that you want in your output dict (so that the entire pattern returns a match). 根据您要模仿的工作示例,还需要匹配输出dict中想要的命名捕获组之间的字符(以便整个模式返回匹配项)。

Following is an example using .*\\n.* as a bit of a brute force way to bridge the gap between your capture groups by matching any non-newline characters after the last capture group, then matching the newline, and then matching any non-newline characters that precede the next capture group (you will probably want to be more precise than this, but it demonstrates the issue). 下面的示例使用.*\\n.*作为蛮力方式,通过匹配最后一个捕获组之后的任何非换行符,然后匹配换行,再匹配任何非换行符来弥合捕获组之间的差距-在下一个捕获组之前的换行符(您可能会比这更精确,但这可以说明问题)。 I only included your first 3 groups because I wasn't following what you intended with the regex in your <clash_status> group. 我只包括了您的前3个组,因为我没有按照您在<clash_status>组中使用正则表达式的意图。

import re

text = '\n\n\nName: Clash1\nDistance: -1.274m\nImage Location: navis_raport_txt_files\\cd000001.jpg\nHardStatus: New\nClash Point: 1585.236m, 193.413m'

clash_data = re.compile(r'(?P<clash_number>Clash\d+).*\n.*'
                        r'(?P<clash_depth>\d.\d{3}).*\n.*'
                        r'(?P<image_location>cd\d+.jpg)', re.I | re.VERBOSE)

result = clash_data.search(text).groupdict()

print(result)
# OUTPUT
# {'clash_number': 'Clash1', 'clash_depth': '1.274', 'image_location': 'cd000001.jpg'}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM