notepad ++ - 使用正则表达式删除HTML开始标记中的属性

Question

Solution: 解：

Find: <([az]+) . ?=". ?( */?>) 查找： <([az]+) . ?=". ?( */?>) <([az]+) . ?=". ?( */?>)

Replace with: <\\1$2 替换为： <\\1$2

I usually copy tables from forum sites to blog sites. 我通常将表格从论坛网站复制到博客网站。

I want no attribute in all start tags. 我想在所有开始标签中都没有属性。
The tables are like this: 表格如下：

1|<table unwanted_attribute_1>
2|<tbody unwanted_attribute_2>
3|<tr unwanted_attribute_3><td unwanted_attribute_4><br unwanted_attribute_5 /></td></tr>
4|<tr unwanted_attribute_3><td unwanted_attribute_4><span unwanted_attribute_6></span></td></tr>
5|</tbody>
6|</table>
Attributes like "cellspacing", "class", "style", "href" and "target".

I found two answers but they do not seem to be helpful. 我找到了两个答案，但似乎没有帮助。
[ A1 ]: It uses a fixed condition to find and replace specific terms. [ A1 ]：它使用固定条件来查找和替换特定术语。 But in my situation, start tags are everywhere and vary with the article. 但在我的情况下，开始标签无处不在，并随文章而变化。
[ A2 ]: I tried this answer but it is not working as follows. [ A2 ]：我尝试了这个答案，但它的工作原理如下。

I find <([az]+) .*=".*"> and replace with <\\1> . 我找到<([az]+) .*=".*">并替换为<\\1> 。
Line 1 and 2 works but line 3 and 4 messed up. 第1行和第2行有效，但第3行和第4行混乱。

How should I use regex? 我该如何使用正则表达式？

EDIT: 编辑：

<table cellspacing="0" class="t_table" style="background-color: #f8f8f8; border-collapse: collapse; border: 1px solid rgb(227, 237, 245); color: #444444; empty-cells: show; font-family: Verdana, Arial, Helvetica, sans-serif; font-size: 16px; line-height: 24px; table-layout: auto; width: 673px; word-wrap: break-word;">
<tbody style="word-wrap: break-word;">
<tr style="word-wrap: break-word;"><td style="border: 1px solid rgb(227, 237, 245); overflow: hidden; padding: 4px; word-wrap: break-word;">◆<a class="relatedlink" href="◆◆◆" style="border-bottom: 1px solid blue; color: #639805; word-wrap: break-word;" target="_blank">◆◆</a>◆◆◆</td><td style="border: 1px solid rgb(227, 237, 245); overflow: hidden; padding: 4px; word-wrap: break-word;">◆◆◆◆<br style="word-wrap: break-word;" />◆◆◆◆</td></tr>
<tr style="word-wrap: break-word;"><td style="border: 1px solid rgb(227, 237, 245); overflow: hidden; padding: 4px; word-wrap: break-word;">◆◆◆◆◆◆</td><td style="border: 1px solid rgb(227, 237, 245); overflow: hidden; padding: 4px; word-wrap: break-word;">= ◆◆◆◆ =<br style="word-wrap: break-word;" /></td></tr>
<tr style="word-wrap: break-word;"><td style="border: 1px solid rgb(227, 237, 245); overflow: hidden; padding: 4px; word-wrap: break-word;">◆◆◆◆◆◆</td><td style="border: 1px solid rgb(227, 237, 245); overflow: hidden; padding: 4px; word-wrap: break-word;">= ◆◆◆◆ =<br style="word-wrap: break-word;" /></td></tr>
<tr style="word-wrap: break-word;"><td style="border: 1px solid rgb(227, 237, 245); overflow: hidden; padding: 4px; word-wrap: break-word;">◆◆◆◆◆◆</td><td style="border: 1px solid rgb(227, 237, 245); overflow: hidden; padding: 4px; word-wrap: break-word;">= ◆◆◆◆ =<br style="word-wrap: break-word;" /></td></tr>
<tr style="word-wrap: break-word;"><td style="border: 1px solid rgb(227, 237, 245); overflow: hidden; padding: 4px; word-wrap: break-word;">◆◆◆◆◆◆</td><td style="border: 1px solid rgb(227, 237, 245); overflow: hidden; padding: 4px; word-wrap: break-word;">◆◆◆◆</td></tr>
<tr style="word-wrap: break-word;"><td style="border: 1px solid rgb(227, 237, 245); overflow: hidden; padding: 4px; word-wrap: break-word;">◆◆◆◆◆◆</td><td style="border: 1px solid rgb(227, 237, 245); overflow: hidden; padding: 4px; word-wrap: break-word;">◆◆◆◆</td></tr>
</tbody></table>

Answer 1

Your .* is greedy so it matches everything until the last "> on your line. Here's what your first regex does: 你的.*是贪婪的，所以它匹配所有内容，直到最后一行">在你的行上。这是你的第一个正则表达式的作用：

https://regex101.com/r/qK5uY3/1 https://regex101.com/r/qK5uY3/1

Try: 尝试：

<([a-z]+) .*?=".*? *\/?>

I'd recommend looking at plugins for notepad++. 我建议看看notepad ++的插件。 There can be many issues using a regex to parse HTML. 使用正则表达式解析HTML可能存在许多问题。

https://regex101.com/r/qK5uY3/2 https://regex101.com/r/qK5uY3/2

The *\\/? *\\/? before the closing > is matching optional whitespace and a self closing element. 在结束之前>匹配可选空格和自闭合元素。 The \\h I prefer to use but I don't know if Notepad++ supports that (I'm mac'er). \\h我喜欢使用，但我不知道Notepad ++是否支持（我是mac'er）。

Update: 更新：

To capture the closing bit of the self closing element group the full closing part. 捕获自闭合元素组的闭合位完全闭合部分。

<([a-z]+) .*?=".*?( *\/?>)

then replace with the 2nd captured group. 然后替换为第二个捕获的组。

<\1$2

Demo: https://regex101.com/r/qK5uY3/3 演示： https ： //regex101.com/r/qK5uY3/3

notepad ++ - 使用正则表达式删除HTML开始标记中的属性

问题描述

1 个解决方案

解决方案1
0 已采纳 2016-09-16 15:02:09

notepad ++ - 使用正则表达式删除HTML开始标记中的属性

问题描述

1 个解决方案

解决方案1 0 已采纳 2016-09-16 15:02:09

解决方案1
0 已采纳 2016-09-16 15:02:09