简体   繁体   中英

Searching and replacing text in html file with notepad++ using regular expression

We are trying to search and replace text in an html file opened with Notepad++.

We need to update the paths inside the tags "a href" and "img src", mantaining the names of the files (filenames and path are different in the files)

So we would need to find all of these lines:

 <a href="File://///name.it.domain.com/PATH/file name.extension"
 <img src="PATH/file name.extension"

And change the PATH/ to images/, for any PATH, maintaining the file name:

<a href="images/file name.extension"
<img src="images/file name.extension"

Here you have some examples:

<a href="File://///name.it.domain.com/directory/name/this is a butterfly.pdf" Should change to <a href="images/this is a butterfly.pdf"

and

<a href="C:/party/koala/main.doc" Should change to <a href="images/main.doc"

<img src="it.free.main/doll/hello.jpg" Should change to <img src="images/hello.jpg"

the path in the file have in common the starting expression such as (File://///name.it.domain.com or C:/ etc) so i'm trying with the expression file:.{number}(.*) in find what field and in replace field $1/images it works but it's not optimal as solution because i have to change it mostly in every file, anyone can help us to find a more general solution?

One way to accomplish this is by looking for the last occurrence of the backslash character, since your examples indicated that the presence of at least once can be relied on.

The find regex:

(href|src)=".*\\(.*)"

The replace regex:

$1="images\\$2"

You can see this in action here with the examples you have provided.

  • Ctrl + H
  • Find what: (?:href|src)="\K[^"]+(?=/[^/."]+\.[^/."]+)
  • Replace with: images
  • UNTICK Match case
  • TICK Wrap around
  • SELECT Regular expression
  • TICK . matches newline . matches newline
  • Replace all

Explanation:

(?:             # non capture group
    href            # literally
  |               # OR
    src             # literally
)               # end group
="              # literally
\K              # forget all we have seen until this position
[^"]+           # 1 or more any character that is not a double quote
(?=             # positive lookahead, make sure we have after:
    /               # a slash
    [^/."]+         # 1 or more any character that is not slash, dot or quote
    \.              # a dot
    [^/."]+         # 1 or more any character that is not slash, dot or quote
)               # end lookahead

Screenshot (before):

在此处输入图像描述

Screenshot (after):

在此处输入图像描述

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM