简体   繁体   English

多次匹配两个正则表达式模式

[英]Match two regex patterns multiple times

I have this string "Energy (kWh/m²)" and I want to get "Energy__KWh_m__", meaning, replacing all non word characters and sub/superscript characters with an underscore.我有这个字符串“Energy (kWh/m²)”,我想得到“Energy__KWh_m__”,意思是用下划线替换所有非单词字符和下标/上标字符。

I have the regex for replacing the non word characters -> re.sub("[\W]", "_", column_name) and the regex for replacing the superscript numbers -> re.sub("[²³¹⁰ⁱ⁴⁵⁶⁷⁸⁹⁺⁻⁼⁽⁾ⁿ]", "", column_name)我有用于替换非单词字符的正则表达式 -> re.sub("[\W]", "_", column_name)和用于替换上标数字的正则表达式 -> re.sub("[²³¹⁰ⁱ⁴⁵⁶⁷⁸⁹⁺⁻⁼⁽⁾ⁿ]", "", column_name)

I have tried combining this into one single regex but I have had no luck.我曾尝试将其组合成一个正则表达式,但我没有运气。 Every time I try I only get partial replacements like "Energy (KWh_m__" - with a regex like ([²³¹⁰ⁱ⁴⁵⁶⁷⁸⁹⁺⁻⁼⁽⁾ⁿ]).*(\W)每次我尝试时,我只会得到部分替换,例如“Energy (KWh_m__”) - 使用正则表达式([²³¹⁰ⁱ⁴⁵⁶7⁷⁸⁹⁺⁻⁼⁽⁾ⁿ]).*(\W)

Any help?有什么帮助吗? Thanks!谢谢!

To combine the two regular expressions you can use the |要组合两个正则表达式,您可以使用| symbol, which means "or" .符号,意思是"or" Here's an example of how you can use it:以下是如何使用它的示例:

import re

column_name = "Energy (kWh/m²)"

pattern = re.compile(r"[\W]|[²³¹⁰ⁱ⁴⁵⁶⁷⁸⁹⁺⁻⁼⁽⁾ⁿ]")
result = pattern.sub("_", column_name)

print(result)

Alternative:选择:

result = re.sub(r"[\W]|[²³¹⁰ⁱ⁴⁵⁶⁷⁸⁹⁺⁻⁼⁽⁾ⁿ]", "_", column_name)

Output:输出:

Energy__kWh_m__

As per your current code, if you plan to remove the superscript chars and replace all other non-word chars with an underscore, you can use根据您当前的代码,如果您打算删除上标字符并用下划线替换所有其他非单词字符,您可以使用

re.sub(r'([²³¹⁰ⁱ⁴⁵⁶⁷⁸⁹⁺⁻⁼⁽⁾ⁿ])|\W', lambda x: '' if x.group(1) else '_', text)

If you plan to match all the non-word chars and the chars in the character class you have, just merge the two:如果您打算匹配所有非单词字符和您拥有的字符类中的字符,只需将两者合并:

re.sub(r'[\W²³¹⁰ⁱ⁴⁵⁶⁷⁸⁹⁺⁻⁼⁽⁾ⁿ]', '_', text)

See this second regex demo .请参阅第二个正则表达式演示 Note that the \W matches the symbols, so you can even shorten this to r'[\W²³¹⁰ⁱ⁴⁵⁶⁷⁸⁹ⁿ]' .请注意, \W与符号匹配,因此您甚至可以将其缩短为r'[\W²³¹⁰ⁱ⁴⁵⁶⁷⁸⁹ⁿ]'

See the Python demo :请参阅Python 演示

import re
text="Energy (kWh/m²)"
print(re.sub(r'([²³¹⁰ⁱ⁴⁵⁶⁷⁸⁹⁺⁻⁼⁽⁾ⁿ])|\W', lambda x: '' if x.group(1) else '_', text)) # => Energy__kWh_m_
print(re.sub(r'[\W²³¹⁰ⁱ⁴⁵⁶⁷⁸⁹⁺⁻⁼⁽⁾ⁿ]', '_', text)) # => Energy__kWh_m__

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM