[英]Searching and replacing text in html file with notepad++ using regular expression
We are trying to search and replace text in an html file opened with Notepad++.我们正在尝试在使用 Notepad++ 打开的 html 文件中搜索和替换文本。
We need to update the paths inside the tags "a href" and "img src", mantaining the names of the files (filenames and path are different in the files)我们需要更新标签“a href”和“img src”内的路径,维护文件的名称(文件中的文件名和路径不同)
So we would need to find all of these lines:所以我们需要找到所有这些行:
<a href="File://///name.it.domain.com/PATH/file name.extension"
<img src="PATH/file name.extension"
And change the PATH/ to images/, for any PATH, maintaining the file name:并将 PATH/ 更改为 images/,对于任何 PATH,保持文件名:
<a href="images/file name.extension"
<img src="images/file name.extension"
Here you have some examples:这里有一些例子:
<a href="File://///name.it.domain.com/directory/name/this is a butterfly.pdf"
Should change to <a href="images/this is a butterfly.pdf"
<a href="File://///name.it.domain.com/directory/name/this is a butterfly.pdf"
应该改为<a href="images/this is a butterfly.pdf"
and和
<a href="C:/party/koala/main.doc"
Should change to <a href="images/main.doc"
<a href="C:/party/koala/main.doc"
应该改为<a href="images/main.doc"
<img src="it.free.main/doll/hello.jpg"
Should change to <img src="images/hello.jpg"
<img src="it.free.main/doll/hello.jpg"
应该改为<img src="images/hello.jpg"
the path in the file have in common the starting expression such as (File://///name.it.domain.com or C:/ etc) so i'm trying with the expression file:.{number}(.*)
in find what field and in replace field $1/images
it works but it's not optimal as solution because i have to change it mostly in every file, anyone can help us to find a more general solution?文件中的路径具有共同的起始表达式,例如(File://///name.it.domain.com 或 C:/ 等)所以我正在尝试使用表达式
file:.{number}(.*)
在查找哪个字段并在替换字段$1/images
中它可以工作,但它不是最佳解决方案,因为我必须在每个文件中主要更改它,任何人都可以帮助我们找到更通用的解决方案?
One way to accomplish this is by looking for the last occurrence of the backslash character, since your examples indicated that the presence of at least once can be relied on.实现此目的的一种方法是查找最后一次出现的反斜杠字符,因为您的示例表明可以依赖至少一次的存在。
The find
regex: find
正则表达式:
(href|src)=".*\\(.*)"
The replace
regex: replace
正则表达式:
$1="images\\$2"
You can see this in action here with the examples you have provided.您可以在此处通过您提供的示例看到这一点。
(?:href|src)="\K[^"]+(?=/[^/."]+\.[^/."]+)
(?:href|src)="\K[^"]+(?=/[^/."]+\.[^/."]+)
images
images
. matches newline
. matches newline
. matches newline
Explanation:解释:
(?: # non capture group
href # literally
| # OR
src # literally
) # end group
=" # literally
\K # forget all we have seen until this position
[^"]+ # 1 or more any character that is not a double quote
(?= # positive lookahead, make sure we have after:
/ # a slash
[^/."]+ # 1 or more any character that is not slash, dot or quote
\. # a dot
[^/."]+ # 1 or more any character that is not slash, dot or quote
) # end lookahead
Screenshot (before):截图(之前):
Screenshot (after):截图(之后):
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.