简体   繁体   中英

Python match on EXACT word, no more, no less

Have simplied the code I am working on below to present the issue I am facing. I am new to python, so this might shine through in my code, so please bear with me.

import re

vlan_names = ['PL-BB', 'PL-BB-VoIP']
vlan_lines = ['create vlan "PL-BB"', 'configure vlan PL-BB tag 135', 'create vlan "PL-BB-VoIP"', 'create vlan "PL-BB-VoIP"']

vlan_config_lines = []

for vlan_line in vlan_lines:
    for vlan_name in vlan_names:
        if re.search(r'\b' + vlan_name + r'\b', vlan_line):
            vlan_config_lines.append(vlan_line.strip("\n"))

print (vlan_config_lines)

The result of this is below:

['create vlan "PL-BB"', 'configure vlan PL-BB tag 135', 'create vlan "PL-BB-VoIP"', 'create vlan "PL-BB-VoIP"', 'create vlan "PL-BB-VoIP"', 'create vlan "PL-BB-VoIP"']

The issue is when running through the iteration the regex search is matching the 'PL-BB' in the word 'PL-BB-VOIP' and why lines like these are repeated twice:

'create vlan "PL-BB-VoIP"'

What I am struggling with is a solution to stop that problem ie I need to have an exact match on the comparison, no more and no less, which should hopefully stop the repetition and wondering if anyone can help.

Many thanks in advance

If this is the exact data you're working with, you don't need regex at all.

Look for the vlan name surrounded by either spaces or double quotes:

for vlan_line in vlan_lines:
    for vlan_name in vlan_names:
        if f' {vlan_name} ' in vlan_line or f'"{vlan_name}"' in vlan_line:

How about trying with negative lookahead. We match given vlan_name and check if the following part doesn't mess with our desired pattern.

import re

vlan_names = ['PL-BB', 'PL-BB-VoIP']
vlan_lines = ['create vlan "PL-BB"', 'configure vlan PL-BB tag 135', 'create vlan "PL-BB-VoIP"', 'create vlan "PL-BB-VoIP"']

vlan_config_lines = []

for vlan_line in vlan_lines:
    for vlan_name in vlan_names:
        if re.search(r'\b' + vlan_name + r'(?![-])\b', vlan_line):
            vlan_config_lines.append(vlan_line.strip("\n"))
            print(f"Found '{vlan_name}' in '{vlan_line.strip()}'")

print(vlan_config_lines)

Explained here on your example: https://regex101.com/r/GHNF1e/1

If you are not looking for characters but a complete word, use split function. It converts a string into list of words.

vlan_names = ['PL-BB', 'PL-BB-VoIP']
vlan_names2 = vlan_names + [f'"{name}"' for name in vlan_names]
vlan_lines = ['create vlan "PL-BB"', 'configure vlan PL-BB tag 135', 'create vlan "PL-BB-VoIP"', 'create vlan "PL-BB-VoIP"']           
matching_lines = [line for line in vlan_lines for word in line.split() if word in vlan_names2]

Output:

matching_lines= 
['create vlan "PL-BB"',
 'configure vlan PL-BB tag 135',
 'create vlan "PL-BB-VoIP"',
 'create vlan "PL-BB-VoIP"']

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM