简体   繁体   English

Python 中的正则表达式:仅当不在列表中时将单词与数字分开(变量异常)

[英]Regex in Python: Separate words from numbers JUST when not in list (Variable exception)

This question is related to this one .这个问题和这个有关。 I'd like to have variable exceptions which can receive a list of alphanumeric variables or null.我想要有可以接收字母数字变量列表或空值的变量异常。

For instance, I have a dummy function that returns possible alphanumeric values which such letters and numbers have to stay together:例如,我有一个虚拟函数,它返回可能的字母数字值,这些字母和数字必须保持在一起:

def get_substitutions(word):
    if word.lower() == 'h20':
        return 'h20'
    return None

In addition, I have the following main function getting those possible alphanumeric values that do not have to be separated.此外,我有以下主要功能来获取那些不必分开的可能的字母数字值。 If the text variable (input) has an alphanumeric word in the exceptions then this will not be separated otherwise space is added :如果文本变量(输入)在异常中包含字母数字单词,则不会将其分隔,否则会添加空格:

import re

text='1ST STREET SCHOOL'

exceptions = list()

for word in re.sub(r'[^\w]+', ' ', text, 0, re.IGNORECASE).split():
    if get_substitutions(word):
        exceptions.extend([word.lower()])

exception_rx = '|'.join(map(re.escape, exceptions))
generic_rx = r'(?<=\d)(?=[^\d\s])|(?<=[^\d\s])(?=\d)'
rx = re.compile(rf'({exception_rx})|{generic_rx}', re.I)

print(rx.sub(lambda x: x.group(1) or " ", text))

However, when exception_rx is null, then I am getting space between each letter:但是,当 exception_rx 为空时,我会在每个字母之间获得空格:

1 S T   S T R E E T   S C H O O L 

Is possible to handle this scenario without including any if statement and just using regex syntax?是否可以在不包含任何 if 语句且仅使用正则表达式语法的情况下处理这种情况?

Thanks for your help谢谢你的帮助

It is impossible to make the regex like ()|abc match abc , because () matches any string and any location in the string (that is why you get a space before each char).不可能让像()|abc这样的正则表达式匹配abc ,因为()匹配任何字符串字符串中的任何位置(这就是为什么每个字符前都有一个空格)。 As in any other NFA regex, the first alternative in a group with |与任何其他 NFA 正则表达式一样,组中的第一个选择| that matches makes the regex engine stop analyzing the further alternatives on the right, they are all skipped.匹配使正则表达式引擎停止分析右侧的其他选项,它们都被跳过。 See Remember That The Regex Engine Is Eager .请参阅记住正则表达式引擎急切

In this situation, you may work around the problem by initializing the exceptions list with a word that you will nevery find in any text .在这种情况下,您可以通过使用在任何文本中都找不到的词初始化exceptions列表来解决该问题。

For example,例如,

exceptions = ['n0tXistIнgŁąrd']

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM