[英]notepad++ - delete attributes in HTML start tags with regex
Solution: 解:
Find: <([az]+) . ?=". ?( */?>)
查找:
<([az]+) . ?=". ?( */?>)
<([az]+) . ?=". ?( */?>)
Replace with: <\\1$2
替换为:
<\\1$2
I usually copy tables from forum sites to blog sites. 我通常将表格从论坛网站复制到博客网站。
I want no attribute in all start tags. 我想在所有开始标签中都没有属性。
The tables are like this: 表格如下:
1|<table unwanted_attribute_1>
2|<tbody unwanted_attribute_2>
3|<tr unwanted_attribute_3><td unwanted_attribute_4><br unwanted_attribute_5 /></td></tr>
4|<tr unwanted_attribute_3><td unwanted_attribute_4><span unwanted_attribute_6></span></td></tr>
5|</tbody>
6|</table>
Attributes like "cellspacing", "class", "style", "href" and "target".
I found two answers but they do not seem to be helpful. 我找到了两个答案,但似乎没有帮助。
[ A1 ]: It uses a fixed condition to find and replace specific terms. [ A1 ]:它使用固定条件来查找和替换特定术语。 But in my situation, start tags are everywhere and vary with the article.
但在我的情况下,开始标签无处不在,并随文章而变化。
[ A2 ]: I tried this answer but it is not working as follows. [ A2 ]:我尝试了这个答案,但它的工作原理如下。
I find <([az]+) .*=".*">
and replace with <\\1>
. 我找到
<([az]+) .*=".*">
并替换为<\\1>
。
Line 1 and 2 works but line 3 and 4 messed up. 第1行和第2行有效,但第3行和第4行混乱。
How should I use regex? 我该如何使用正则表达式?
EDIT: 编辑:
<table cellspacing="0" class="t_table" style="background-color: #f8f8f8; border-collapse: collapse; border: 1px solid rgb(227, 237, 245); color: #444444; empty-cells: show; font-family: Verdana, Arial, Helvetica, sans-serif; font-size: 16px; line-height: 24px; table-layout: auto; width: 673px; word-wrap: break-word;">
<tbody style="word-wrap: break-word;">
<tr style="word-wrap: break-word;"><td style="border: 1px solid rgb(227, 237, 245); overflow: hidden; padding: 4px; word-wrap: break-word;">◆<a class="relatedlink" href="◆◆◆" style="border-bottom: 1px solid blue; color: #639805; word-wrap: break-word;" target="_blank">◆◆</a>◆◆◆</td><td style="border: 1px solid rgb(227, 237, 245); overflow: hidden; padding: 4px; word-wrap: break-word;">◆◆◆◆<br style="word-wrap: break-word;" />◆◆◆◆</td></tr>
<tr style="word-wrap: break-word;"><td style="border: 1px solid rgb(227, 237, 245); overflow: hidden; padding: 4px; word-wrap: break-word;">◆◆◆◆◆◆</td><td style="border: 1px solid rgb(227, 237, 245); overflow: hidden; padding: 4px; word-wrap: break-word;">= ◆◆◆◆ =<br style="word-wrap: break-word;" /></td></tr>
<tr style="word-wrap: break-word;"><td style="border: 1px solid rgb(227, 237, 245); overflow: hidden; padding: 4px; word-wrap: break-word;">◆◆◆◆◆◆</td><td style="border: 1px solid rgb(227, 237, 245); overflow: hidden; padding: 4px; word-wrap: break-word;">= ◆◆◆◆ =<br style="word-wrap: break-word;" /></td></tr>
<tr style="word-wrap: break-word;"><td style="border: 1px solid rgb(227, 237, 245); overflow: hidden; padding: 4px; word-wrap: break-word;">◆◆◆◆◆◆</td><td style="border: 1px solid rgb(227, 237, 245); overflow: hidden; padding: 4px; word-wrap: break-word;">= ◆◆◆◆ =<br style="word-wrap: break-word;" /></td></tr>
<tr style="word-wrap: break-word;"><td style="border: 1px solid rgb(227, 237, 245); overflow: hidden; padding: 4px; word-wrap: break-word;">◆◆◆◆◆◆</td><td style="border: 1px solid rgb(227, 237, 245); overflow: hidden; padding: 4px; word-wrap: break-word;">◆◆◆◆</td></tr>
<tr style="word-wrap: break-word;"><td style="border: 1px solid rgb(227, 237, 245); overflow: hidden; padding: 4px; word-wrap: break-word;">◆◆◆◆◆◆</td><td style="border: 1px solid rgb(227, 237, 245); overflow: hidden; padding: 4px; word-wrap: break-word;">◆◆◆◆</td></tr>
</tbody></table>
Your .*
is greedy so it matches everything until the last ">
on your line. Here's what your first regex does: 你的
.*
是贪婪的,所以它匹配所有内容,直到最后一行">
在你的行上。这是你的第一个正则表达式的作用:
https://regex101.com/r/qK5uY3/1 https://regex101.com/r/qK5uY3/1
Try: 尝试:
<([a-z]+) .*?=".*? *\/?>
I'd recommend looking at plugins for notepad++. 我建议看看notepad ++的插件。 There can be many issues using a regex to parse HTML.
使用正则表达式解析HTML可能存在许多问题。
https://regex101.com/r/qK5uY3/2 https://regex101.com/r/qK5uY3/2
The *\\/?
*\\/?
before the closing >
is matching optional whitespace and a self closing element. 在结束之前
>
匹配可选空格和自闭合元素。 The \\h
I prefer to use but I don't know if Notepad++ supports that (I'm mac'er). \\h
我喜欢使用,但我不知道Notepad ++是否支持(我是mac'er)。
Update: 更新:
To capture the closing bit of the self closing element group the full closing part. 捕获自闭合元素组的闭合位完全闭合部分。
<([a-z]+) .*?=".*?( *\/?>)
then replace with the 2nd captured group. 然后替换为第二个捕获的组。
<\1$2
Demo: https://regex101.com/r/qK5uY3/3 演示: https : //regex101.com/r/qK5uY3/3
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.