简体   繁体   中英

extracting items using regular expression in python

I have aa file which has the following :

new=['{"TES1":"=TES0"}}', '{"""TES1:IDD""": """=0x3C""", """TES1:VCC""": """=0x00"""}']

I am trying to extract the first item, TES1:=TES0 from the list. I am trying to use a regular expression to do this. This is what i tried but i am not able to grab the second item TES0.

import re
TES=re.compile('(TES[\d].)+')
for item in new:
    result = TES.search(item)
    print result.groups()

The result of the print was ('TES1:',). I have tried various ways to extract it but am always getting the same result. Any suggestion or help is appreciated. Thanks!

I think you are looking for findall :

import re
TES=re.compile('TES[\d].')
for item in new:
    result = TES.findall(item)
    print result

You can use a single replacement, example:

import re

result = re.sub(r'{"(TES\d)":"(=TES\d)"}}', '$1:$2', yourstr, 1)

First Option (with quotes)

To match "TES1":"=TES0" , you can use this regex:

"TES\d+":"=TES\d+"

like this:

match = re.search(r'"TES\d+":"=TES\d+"', subject)
if match:
    result = match.group()

Second Option (without quotes)

If you want to get rid of the quotes, as in TES1:=TES0 , you use this regex:

Search: "(TES\\d+)":"(=TES\\d+)"

Replace: \\1:\\2

like this:

result = re.sub(r'"(TES\d+)":"(=TES\d+)"', r"\1:\2", subject)

How does it work?

"(TES\d+)":"(=TES\d+)"
  • Match the character “"” literally "
  • Match the regex below and capture its match into backreference number 1 (TES\\d+)
    • Match the character string “TES” literally (case sensitive) TES
    • Match a single character that is a “digit” (0–9 in any Unicode script) \\d+
      • Between one and unlimited times, as many times as possible, giving back as needed (greedy) +
  • Match the character string “":"” literally ":"
  • Match the regex below and capture its match into backreference number 2 (=TES\\d+)
    • Match the character string “=TES” literally (case sensitive) =TES
    • Match a single character that is a “digit” (0–9 in any Unicode script) \\d+
      • Between one and unlimited times, as many times as possible, giving back as needed (greedy) +
  • Match the character “"” literally "

    \\1:\\2

  • Insert the text that was last matched by capturing group number 1 \\1

  • Insert the character “:” literally :
  • Insert the text that was last matched by capturing group number 2 \\2

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM