简体   繁体   中英

how to change digits in a string using regex

I have a string like..

'1.5"x3"x10" hey 7" x 4"x 2" how 9.5" x 9.5" x 7.5" are 7.1"x 4"x 2" you ..and rest of our conversation

what i want is to convert the string into..

'1.5x3x10 hey 7x4x2 how 9.5x9.5x7.5 are 7.1x4x2 you.. and rest of our conversation

in short, to remove the white space and " between the digits..

i tried to find the pattern by doing..

stuff = re.findall('(\d+\.\d+|\d+)?["]\s?x\s?(\d+\.\d+|\d+)?["]\s?x\s?(\d+\.\d+|\d+)?["]',strings)
print sub

it returns me

[('1.5', '3', '10'), ('7', '4', '2'), ('9.5', '9.5', '7.5'), ('7.1', '4', '2')]

so i tried ,

stuff = re.findall('\d+["]\s?x\s?\d+["]\s?x\s?\d+["]',strings)
print stuff

it returns me

['5"x3"x10"', '7" x 4"x 2"', '1"x 4"x 2"']

it doesn't include any digits..how can i convert my string to desired one? any help ?

If you really want to do it in one step you'll have to do multiple lookaheads/lookbehinds to account for all cases (and it's a question if all of them are even captured with this one):

import re

my_str = '\'1.5"x3"x10" hey 7" x 4"x 2" how 9.5" x 9.5" x 7.5" are 7.1"x 4"x 2" you ..and rest of our conversation'

mod_str = re.sub(r'(?<=[\dx])["\s]+(?=[x\s])|(?<=x)\s(?=\d)', '', my_str)
print(mod_str)

gets you:

'1.5x3x10 hey 7x4x2 how 9.5x9.5x7.5 are 7.1x4x2 you ..and rest of our conversation

It would probably be faster (and easier to capture outliers) if you were to split this into a multi-step process.

Explanation:

There are two search patterns here, (?<=[\\dx])["\\s]+(?=[x\\s]) and (?<=x)\\s(?=\\d) , they are separated by | to denote one or the other (in left-to-right fashion, so if the first group captures a piece of content the second won't be executed on it).

The first:

(?<=            positive non-capturing lookbehind, capture the next segment only if match
  [\dx])        match a single digit (0-9) or the 'x' character
)
  ["\s]+        match one or more " characters or whitespace
(?=             positive non-capturing lookahead, capture the previous segment only if match
  [x\s]         match a single whitespace or 'x' character
)

The second:

(?<=            positive non-capturing lookbehind, capture the next segment only if match
  x             match the 'x' character
)
\s              match a single whitespace
(?=             positive non-capturing lookahead, capture the previous segment only if match
  \d            match a single digit (0-9)
)

The first takes care of selecting whitespace and quotation marks around your digits, the second extends selecting white space around "x" characters only if followed by number to augment the deficiency of the first pattern. Together, they match the correct quotation marks and whitespaces which then get replaced by empty string using the re.sub() method.

zwer is clearly a master at regex. You might, however, be interested in an alternative approach that sometimes makes it possible to use simpler expressions. It involves using the re module to identify the strings for changing and then using a Python function to do the manipulation.

In this case we want to identify numbers with or without decimals, always followed by " and x sometimes preceded or succeeded by one or more blanks. This code uses a regex with alternative expressions to look for both, passes what it finds to replacer and leaves it to this function to discard unwanted characters.

>>> import re
>>> quest = '1.5"x3"x10" hey 7" x 4"x 2" how 9.5" x 9.5" x 7.5" are 7.1"x 4"x 2" you ..and rest of our conversation'
>>> def replacer(matchobj):
...     for group in matchobj.groups():
...         if group:
...             return group.replace(' ', '').replace('"', '')
... 
>>> re.sub(r'([0-9\.]+\")|(\s*x\s*)', replacer, quest)
'1.5x3x10 hey 7x4x2 how 9.5x9.5x7.5 are 7.1x4x2 you ..and rest of our conversation'

Details in the Python doc in the section for sub .

I wouldn't get too complex here.

I'd just match one group of dimensions at a time then replace the whitespace and double quotes.

(\\d+(?:\\.\\d+)?(?:\\s*"\\s*x\\s*\\d+(?:\\.\\d+)?){2}\\s*")

Expanded

 (                             # (1 start)
      \d+ 
      (?: \. \d+ )?
      (?:
           \s* " \s* x \s* 
           \d+ 
           (?: \. \d+ )?
      ){2}
      \s* "
 )                             # (1 end)

Python demo http://rextester.com/HUIYP80133

Python code

import re

def repl(m):
    contents = m.group(1)
    return re.sub( r'[\s"]+','', contents )

str = '\'1.5"x3"x10" hey 7" x 4"x 2" how 9.5" x 9.5" x 7.5" are 7.1"x 4"x 2" you ..and rest of our conversation'

newstr = re.sub(r'(\d+(?:\.\d+)?(?:\s*"\s*x\s*\d+(?:\.\d+)?){2}\s*")', repl, str)

print newstr

Output

'1.5x3x10 hey 7x4x2 how 9.5x9.5x7.5 are 7.1x4x2 you ..and rest of our conversation

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM