简体   繁体   中英

How to find and replace in a regex code

I am trying to find and replace in a regex code

<div class="gallery-image-container">
    <div jstcache="1116"
         class="gallery-image-high-res loaded"
         style="width: 396px;
                height: 264px;
                background-image: url(&quot;https://lh5.googleusercontent.com/p/AF1QipMcTfMPZj_d5iip9WKtN2SQB9Je5U4rRB0nT_t8=s396-k-no&quot;);
                background-size: 396px 264px;"
         jsan="7.gallery-image-high-res,7.loaded,5.width,5.height,5.background-image,5.background-size">
    </div>
</div>

In the code above I used This

(https:\/\/[^&]*)

To extract this URL

https://lh5.googleusercontent.com/p/AF1QipMcTfMPZj_d5iip9WKtN2SQB9Je5U4rRB0nT_t8=s396-k-no

I used This regex s\\d{3} to get s396

Now I want to replace s396 to s1000 in the URL

Now am Stock and don't know how to go about it.

Please is there anyway all these can be done in just one regex code not multiple codes?

I would suggest using an HTML parser, but I understand sometimes that is not possible. Here is a little example in python.

import re

data = '''
<div class="gallery-image-container">
    <div jstcache="1116"
         class="gallery-image-high-res loaded"
         style="width: 396px;
            height: 264px;
            background-image: url(&quot;https://lh5.googleusercontent.com/p/AF1QipMcTfMPZj_d5iip9WKtN2SQB9Je5U4rRB0nT_t8=s396-k-no&quot;);
            background-size: 396px 264px;"
         jsan="7.gallery-image-high-res,7.loaded,5.width,5.height,5.background-image,5.background-size">
    </div>
</div>
'''
match = re.search("(https?://[^&]+)", data)
url = match.group(1)
url = re.sub("s\d{3}", "s1000", url)
print(url)

They key part is the regex of

(https?://[^&]+)

It is using a negative character class. It's saying, look for http with an optional s followed by :// and then all the non & You can use this site to play around with regexs:

https://regex101.com/r/b0APFA/1

I'm sure you could do a clever 1 liner nested regex to find and replace all at once, but it's going to be easier to troubleshoot if you have a few lines.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM