简体   繁体   English

如何使用正则表达式更改字符串中的数字

[英]how to change digits in a string using regex

I have a string like.. 我有一个字符串像..

'1.5"x3"x10" hey 7" x 4"x 2" how 9.5" x 9.5" x 7.5" are 7.1"x 4"x 2" you ..and rest of our conversation

what i want is to convert the string into.. 我想要的是将字符串转换成..

'1.5x3x10 hey 7x4x2 how 9.5x9.5x7.5 are 7.1x4x2 you.. and rest of our conversation

in short, to remove the white space and " between the digits.. 简而言之,删除数字之间的空格和"

i tried to find the pattern by doing.. 我试图通过做找到模式。

stuff = re.findall('(\d+\.\d+|\d+)?["]\s?x\s?(\d+\.\d+|\d+)?["]\s?x\s?(\d+\.\d+|\d+)?["]',strings)
print sub

it returns me 它返回我

[('1.5', '3', '10'), ('7', '4', '2'), ('9.5', '9.5', '7.5'), ('7.1', '4', '2')]

so i tried , 所以我尝试了

stuff = re.findall('\d+["]\s?x\s?\d+["]\s?x\s?\d+["]',strings)
print stuff

it returns me 它返回我

['5"x3"x10"', '7" x 4"x 2"', '1"x 4"x 2"']

it doesn't include any digits..how can i convert my string to desired one? 它不包含任何数字..我如何将我的字符串转换为所需的数字? any help ? 有什么帮助吗?

If you really want to do it in one step you'll have to do multiple lookaheads/lookbehinds to account for all cases (and it's a question if all of them are even captured with this one): 如果您真的想一步一步做,就必须对所有情况进行多次前瞻/后顾之忧(这是所有问题都被这个案例捕获的一个问题):

import re

my_str = '\'1.5"x3"x10" hey 7" x 4"x 2" how 9.5" x 9.5" x 7.5" are 7.1"x 4"x 2" you ..and rest of our conversation'

mod_str = re.sub(r'(?<=[\dx])["\s]+(?=[x\s])|(?<=x)\s(?=\d)', '', my_str)
print(mod_str)

gets you: 让您:

'1.5x3x10 hey 7x4x2 how 9.5x9.5x7.5 are 7.1x4x2 you ..and rest of our conversation

It would probably be faster (and easier to capture outliers) if you were to split this into a multi-step process. 如果将其分为多个步骤,可能会更快(更容易捕获异常值)。

Explanation: 说明:

There are two search patterns here, (?<=[\\dx])["\\s]+(?=[x\\s]) and (?<=x)\\s(?=\\d) , they are separated by | to denote one or the other (in left-to-right fashion, so if the first group captures a piece of content the second won't be executed on it). 这里有两种搜索模式, (?<=[\\dx])["\\s]+(?=[x\\s])(?<=x)\\s(?=\\d) ,它们是分开的by |表示一个或另一个(以从左到右的方式,因此如果第一个组捕获了一部分内容,则第二组将不会在其上执行)。

The first: 首先:

(?<=            positive non-capturing lookbehind, capture the next segment only if match
  [\dx])        match a single digit (0-9) or the 'x' character
)
  ["\s]+        match one or more " characters or whitespace
(?=             positive non-capturing lookahead, capture the previous segment only if match
  [x\s]         match a single whitespace or 'x' character
)

The second: 第二:

(?<=            positive non-capturing lookbehind, capture the next segment only if match
  x             match the 'x' character
)
\s              match a single whitespace
(?=             positive non-capturing lookahead, capture the previous segment only if match
  \d            match a single digit (0-9)
)

The first takes care of selecting whitespace and quotation marks around your digits, the second extends selecting white space around "x" characters only if followed by number to augment the deficiency of the first pattern. 前者负责选择数字周围的空格和引号,而后一种则扩展了选择“ x”字符周围的空格的能力,只有在其后跟数字以增加第一个模式的不足之处。 Together, they match the correct quotation marks and whitespaces which then get replaced by empty string using the re.sub() method. 它们一起匹配正确的引号和空格,然后使用re.sub()方法将其替换为空字符串。

zwer is clearly a master at regex. zwer显然是regex的高手。 You might, however, be interested in an alternative approach that sometimes makes it possible to use simpler expressions. 但是,您可能对替代方法感兴趣,该方法有时可以使用更简单的表达式。 It involves using the re module to identify the strings for changing and then using a Python function to do the manipulation. 它涉及使用re模块来标识要更改的字符串,然后使用Python函数进行操作。

In this case we want to identify numbers with or without decimals, always followed by " and x sometimes preceded or succeeded by one or more blanks. This code uses a regex with alternative expressions to look for both, passes what it finds to replacer and leaves it to this function to discard unwanted characters. 在这种情况下,我们要识别带小数或不带小数的数字,始终后跟""x有时在一个或多个空格之前或之后。此代码使用带有备用表达式的正则表达式查找两者,并将查找到的内容传递给replacer并保留此功能可丢弃不需要的字符。

>>> import re
>>> quest = '1.5"x3"x10" hey 7" x 4"x 2" how 9.5" x 9.5" x 7.5" are 7.1"x 4"x 2" you ..and rest of our conversation'
>>> def replacer(matchobj):
...     for group in matchobj.groups():
...         if group:
...             return group.replace(' ', '').replace('"', '')
... 
>>> re.sub(r'([0-9\.]+\")|(\s*x\s*)', replacer, quest)
'1.5x3x10 hey 7x4x2 how 9.5x9.5x7.5 are 7.1x4x2 you ..and rest of our conversation'

Details in the Python doc in the section for sub . sub的Python文档中的详细信息。

I wouldn't get too complex here. 我在这里不会太复杂。

I'd just match one group of dimensions at a time then replace the whitespace and double quotes. 我只一次匹配一组尺寸,然后替换空白和双引号。

(\\d+(?:\\.\\d+)?(?:\\s*"\\s*x\\s*\\d+(?:\\.\\d+)?){2}\\s*")

Expanded 展开式

 (                             # (1 start)
      \d+ 
      (?: \. \d+ )?
      (?:
           \s* " \s* x \s* 
           \d+ 
           (?: \. \d+ )?
      ){2}
      \s* "
 )                             # (1 end)

Python demo http://rextester.com/HUIYP80133 Python演示http://rextester.com/HUIYP80133

Python code Python代码

import re

def repl(m):
    contents = m.group(1)
    return re.sub( r'[\s"]+','', contents )

str = '\'1.5"x3"x10" hey 7" x 4"x 2" how 9.5" x 9.5" x 7.5" are 7.1"x 4"x 2" you ..and rest of our conversation'

newstr = re.sub(r'(\d+(?:\.\d+)?(?:\s*"\s*x\s*\d+(?:\.\d+)?){2}\s*")', repl, str)

print newstr

Output 输出量

'1.5x3x10 hey 7x4x2 how 9.5x9.5x7.5 are 7.1x4x2 you ..and rest of our conversation

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM