简体   繁体   English

多个重叠事件的正则表达式匹配?

[英]Regex match on multiple overlapping occurrences?

I have strings that look like:我的字符串看起来像:

sometext 3x 24x5 x 17.5 x 3 sometext

And I would like to concatenate all instances of digit + optional space + x + optional space + digit into digit + x + digit.我想将 digit + 可选空间 + x + 可选空间 + digit 的所有实例连接成 digit + x + digit。 Desired output:所需的 output:

sometext 3x24x5x17.5x3 sometext

My current Regex seems fine, but somehow it doesn't work:我当前的正则表达式似乎很好,但不知何故它不起作用:

re.sub(r'(\d)\s?([x])\s?(\d)', r'\1\2\3', 'sometext 3x 24x5 x 17.5 x 3 sometext')

Yields产量

sometext 3x24x5 x 17.5x3 sometext

It seems this has to do with the fact that the 24x5 is already captured by the expression, so it doesn't consider 5 x 17. My question would be, how to adjust my regex for the desired purpose, and, is there any more clean/efficient way to write that regex than my approach?似乎这与表达式已经捕获了 24x5 的事实有关,因此它不考虑 5 x 17。我的问题是,如何调整我的正则表达式以达到所需的目的,还有没有更多比我的方法更干净/有效的方式来编写该正则表达式? Thanks!谢谢!

You could use re.sub to identify all number-x terms, then use a callback to strip all whitespace from each match:您可以使用re.sub来识别所有 number-x 项,然后使用回调从每个匹配项中删除所有空格:

inp = "sometext 3x 24x5 x 17.5 x 3 sometext 1 x 2.3 x 4"
output = re.sub(r'\d+(?:\.\d+)?(?:\s*x\s*\d+(?:\.\d+)?)+', lambda m: re.sub(r'\s', '', m.group(0)), inp)
print(output)

This prints:这打印:

sometext 3x24x5x17.5x3 sometext 1x2.3x4

I suggest two options:我建议两种选择:

import re
s = 'sometext 3x 24x5 x 17.5 x 3 sometext'
print (re.sub(r'(?<=\d)\s+(?=x)|(?<=x)\s+(?=\d)', '', s))
print (re.sub(r'(?<=\d)\s+(?=x\s*\d)|(\d)\s*(x)\s+(?=\d)', r'\1\2', s))

See the Python demo .请参阅Python 演示 Both return sometext 3x24x5x17.5x3 sometext , but the second seems to be more precise.两者都返回sometext 3x24x5x17.5x3 sometext ,但第二个似乎更精确。

Regex #1 details正则表达式 #1详细信息

  • (?<=\d)\s+(?=x) - one or more whitespaces between a digit and x (?<=\d)\s+(?=x) - 数字和x之间的一个或多个空格
  • | - or - 或者
  • (?<=x)\s+(?=\d) - one or more whitespaces between an x and a digit (?<=x)\s+(?=\d) - x和数字之间的一个或多个空格

Regex #2 details正则表达式 #2详细信息

  • (?<=\d)\s+(?=x\s*\d) - one or more whitespaces between a digit and x + zero or more whitespaces and a digit (?<=\d)\s+(?=x\s*\d) - 数字和x之间的一个或多个空格 + 零个或多个空格和一个数字
  • | - or - 或者
  • (\d)\s*(x)\s+(?=\d) - matches a digit (captured into Group 1), then one or more whitespaces, then x (captured in Group 2) and then \s+ matches 1 or more whitespaces followed with a digit. (\d)\s*(x)\s+(?=\d) - 匹配一个数字(被捕获到第 1 组),然后是一个或多个空格,然后是x (在第 2 组中捕获),然后\s+匹配 1 或更多的空格后跟一个数字。

The replacement is the concatenation of Group 1 and 2 values.替换是第 1 组和第 2 组值的串联。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM