简体   繁体   English

使用正则表达式使用notepad ++搜索和替换html文件中的文本

[英]Searching and replacing text in html file with notepad++ using regular expression

We are trying to search and replace text in an html file opened with Notepad++.我们正在尝试在使用 Notepad++ 打开的 html 文件中搜索和替换文本。

We need to update the paths inside the tags "a href" and "img src", mantaining the names of the files (filenames and path are different in the files)我们需要更新标签“a href”和“img src”内的路径,维护文件的名称(文件中的文件名和路径不同)

So we would need to find all of these lines:所以我们需要找到所有这些行:

 <a href="File://///name.it.domain.com/PATH/file name.extension"
 <img src="PATH/file name.extension"

And change the PATH/ to images/, for any PATH, maintaining the file name:并将 PATH/ 更改为 images/,对于任何 PATH,保持文件名:

<a href="images/file name.extension"
<img src="images/file name.extension"

Here you have some examples:这里有一些例子:

<a href="File://///name.it.domain.com/directory/name/this is a butterfly.pdf" Should change to <a href="images/this is a butterfly.pdf" <a href="File://///name.it.domain.com/directory/name/this is a butterfly.pdf"应该改为<a href="images/this is a butterfly.pdf"

and

<a href="C:/party/koala/main.doc" Should change to <a href="images/main.doc" <a href="C:/party/koala/main.doc"应该改为<a href="images/main.doc"

<img src="it.free.main/doll/hello.jpg" Should change to <img src="images/hello.jpg" <img src="it.free.main/doll/hello.jpg"应该改为<img src="images/hello.jpg"

the path in the file have in common the starting expression such as (File://///name.it.domain.com or C:/ etc) so i'm trying with the expression file:.{number}(.*) in find what field and in replace field $1/images it works but it's not optimal as solution because i have to change it mostly in every file, anyone can help us to find a more general solution?文件中的路径具有共同的起始表达式,例如(File://///name.it.domain.com 或 C:/ 等)所以我正在尝试使用表达式file:.{number}(.*)在查找哪个字段并在替换字段$1/images中它可以工作,但它不是最佳解决方案,因为我必须在每个文件中主要更改它,任何人都可以帮助我们找到更通用的解决方案?

One way to accomplish this is by looking for the last occurrence of the backslash character, since your examples indicated that the presence of at least once can be relied on.实现此目的的一种方法是查找最后一次出现的反斜杠字符,因为您的示例表明可以依赖至少一次的存在。

The find regex: find正则表达式:

(href|src)=".*\\(.*)"

The replace regex: replace正则表达式:

$1="images\\$2"

You can see this in action here with the examples you have provided.您可以在此处通过您提供的示例看到这一点。

  • Ctrl + H Ctrl + H
  • Find what: (?:href|src)="\K[^"]+(?=/[^/."]+\.[^/."]+)查找什么: (?:href|src)="\K[^"]+(?=/[^/."]+\.[^/."]+)
  • Replace with: images替换为: images
  • UNTICK Match case UNTICK火柴盒
  • TICK Wrap around TICK环绕
  • SELECT Regular expression SELECT正则表达式
  • TICK . matches newline打勾. matches newline . matches newline
  • Replace all全部替换

Explanation:解释:

(?:             # non capture group
    href            # literally
  |               # OR
    src             # literally
)               # end group
="              # literally
\K              # forget all we have seen until this position
[^"]+           # 1 or more any character that is not a double quote
(?=             # positive lookahead, make sure we have after:
    /               # a slash
    [^/."]+         # 1 or more any character that is not slash, dot or quote
    \.              # a dot
    [^/."]+         # 1 or more any character that is not slash, dot or quote
)               # end lookahead

Screenshot (before):截图(之前):

在此处输入图像描述

Screenshot (after):截图(之后):

在此处输入图像描述

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM