简体   繁体   English

如何从 Python 中的字符串列表中提取完全匹配到单独的列表中

[英]How to extract exact match from list of strings in Python into separate lists

This is an example list of strings这是字符串的示例列表

new_text = ['XIC(Switch_A)OTE(Light1) XIC(Light1)OTE(Light2) Motor On Delay Timer XIC(Light1)TON(Motor_timer',
 '?',
 '?) XIC(Motor_timer.DN)OTE(Motor)']

I would like to extract XIC(Switch_A) into one list, OTE(Light1) into another list, TON(Motor_timer) into another list and so on.我想将XIC(Switch_A)提取到一个列表中,将OTE(Light1) Light1) 提取到另一个列表中,将TON(Motor_timer)到另一个列表中,依此类推。

This is the code in Python 3 that I have tried这是我尝试过的 Python 3 中的代码

for words in new_text:
    match = re.search('XIC(.*)', words)
print(match.group(1))

How do I go about extracting OTE(Tag name) , XIC(Tag name) , XIO(Tag name) into their own lists or groups?我如何 go 关于将OTE(Tag name)XIC(Tag name)XIO(Tag name)提取到自己的列表或组中?

I hope I've understood you question right:我希望我理解你的问题是正确的:

import re

lst = [
    "XIC(Switch_A)OTE(Light1) XIC(Light1)OTE(Light2) Motor On Delay Timer XIC(Light1)TON(Motor_timer",
    "?",
    "?) XIC(Motor_timer.DN)OTE(Motor)",
]

pat = re.compile(r"[A-Z]+\([^)]+\)")

out = []
for s in lst:
    for val in pat.findall(s):
        out.append(val)

print(out)

Prints:印刷:

[
    "XIC(Switch_A)",
    "OTE(Light1)",
    "XIC(Light1)",
    "OTE(Light2)",
    "XIC(Light1)",
    "XIC(Motor_timer.DN)",
    "OTE(Motor)",
]

You can use the following regex to match any three uppercase letters , followed by anything in parentheses:您可以使用以下正则表达式匹配任何三个大写字母,后跟括号中的任何内容:

([A-Z]{3})(\([^)]+\))
(        )             : Capturing group 1
          (         )  : Capturing group 2
 [A-Z]{3}              : Exactly three uppercase letters
           \(     \)   : Literal open/close parentheses
             [^)]+     : One or more of any character that is not )

Regex101正则表达式101

Use a collections.defaultdict to keep track of all your results.使用collections.defaultdict跟踪所有结果。 The identifier will be the key for this defaultdict, and the values will be lists containing all the matches for that identifier.标识符将是此默认字典的键,值将是包含该标识符的所有匹配项的列表。

from collections import defaultdict
results = defaultdict(list)

regex = re.compile(r"([A-Z]{3})(\([^)]+\))")

for s in new_text:
    matches = regex.findall(s)
    for m in matches: 
        identifier = m[0]
        results[identifier].append(m[0] + m[1])

Which gives the following results :这给出了以下results

{'XIC': ['XIC(Switch_A)', 'XIC(Light1)', 'XIC(Light1)', 'XIC(Motor_timer.DN)'],
 'OTE': ['OTE(Light1)', 'OTE(Light2)', 'OTE(Motor)']}

Since you have a fixed set of identifiers, you can replace the [AZ]{3} portion of the regex with something that will only match your identifiers:由于您有一组固定的标识符,您可以将正则表达式的[AZ]{3}部分替换为仅与您的标识符匹配的内容:

regex = re.compile(r"(XIC|XIO|OTE|TON|TOF)(\([^)]+\))")

It is also possible to build this regex if you have your identifiers in an iterable:如果您在可迭代中包含标识符,也可以构建此正则表达式:

identifiers = ["XIC", "XIO", "OTE", "TON", "TOF"]
regex = re.compile(rf"({'|'.join(identifiers)})(\([^)]+\))")

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM