简体   繁体   English

notepad ++ - 使用正则表达式删除HTML开始标记中的属性

[英]notepad++ - delete attributes in HTML start tags with regex

Solution: 解:

Find: <([az]+) . ?=". ?( */?>) 查找: <([az]+) . ?=". ?( */?>) <([az]+) . ?=". ?( */?>)

Replace with: <\\1$2 替换为: <\\1$2


I usually copy tables from forum sites to blog sites. 我通常将表格从论坛网站复制到博客网站。

I want no attribute in all start tags. 我想在所有开始标签中都没有属性。
The tables are like this: 表格如下:

1|<table unwanted_attribute_1>
2|<tbody unwanted_attribute_2>
3|<tr unwanted_attribute_3><td unwanted_attribute_4><br unwanted_attribute_5 /></td></tr>
4|<tr unwanted_attribute_3><td unwanted_attribute_4><span unwanted_attribute_6></span></td></tr>
5|</tbody>
6|</table>
Attributes like "cellspacing", "class", "style", "href" and "target".

I found two answers but they do not seem to be helpful. 我找到了两个答案,但似乎没有帮助。
[ A1 ]: It uses a fixed condition to find and replace specific terms. [ A1 ]:它使用固定条件来查找和替换特定术语。 But in my situation, start tags are everywhere and vary with the article. 但在我的情况下,开始标签无处不在,并随文章而变化。
[ A2 ]: I tried this answer but it is not working as follows. [ A2 ]:我尝试了这个答案,但它的工作原理如下。

I find <([az]+) .*=".*"> and replace with <\\1> . 我找到<([az]+) .*=".*">并替换为<\\1>
Line 1 and 2 works but line 3 and 4 messed up. 第1行和第2行有效,但第3行和第4行混乱。

How should I use regex? 我该如何使用正则表达式?

EDIT: 编辑:

<table cellspacing="0" class="t_table" style="background-color: #f8f8f8; border-collapse: collapse; border: 1px solid rgb(227, 237, 245); color: #444444; empty-cells: show; font-family: Verdana, Arial, Helvetica, sans-serif; font-size: 16px; line-height: 24px; table-layout: auto; width: 673px; word-wrap: break-word;">
<tbody style="word-wrap: break-word;">
<tr style="word-wrap: break-word;"><td style="border: 1px solid rgb(227, 237, 245); overflow: hidden; padding: 4px; word-wrap: break-word;">◆<a class="relatedlink" href="◆◆◆" style="border-bottom: 1px solid blue; color: #639805; word-wrap: break-word;" target="_blank">◆◆</a>◆◆◆</td><td style="border: 1px solid rgb(227, 237, 245); overflow: hidden; padding: 4px; word-wrap: break-word;">◆◆◆◆<br style="word-wrap: break-word;" />◆◆◆◆</td></tr>
<tr style="word-wrap: break-word;"><td style="border: 1px solid rgb(227, 237, 245); overflow: hidden; padding: 4px; word-wrap: break-word;">◆◆◆◆◆◆</td><td style="border: 1px solid rgb(227, 237, 245); overflow: hidden; padding: 4px; word-wrap: break-word;">= ◆◆◆◆ =<br style="word-wrap: break-word;" /></td></tr>
<tr style="word-wrap: break-word;"><td style="border: 1px solid rgb(227, 237, 245); overflow: hidden; padding: 4px; word-wrap: break-word;">◆◆◆◆◆◆</td><td style="border: 1px solid rgb(227, 237, 245); overflow: hidden; padding: 4px; word-wrap: break-word;">= ◆◆◆◆ =<br style="word-wrap: break-word;" /></td></tr>
<tr style="word-wrap: break-word;"><td style="border: 1px solid rgb(227, 237, 245); overflow: hidden; padding: 4px; word-wrap: break-word;">◆◆◆◆◆◆</td><td style="border: 1px solid rgb(227, 237, 245); overflow: hidden; padding: 4px; word-wrap: break-word;">= ◆◆◆◆ =<br style="word-wrap: break-word;" /></td></tr>
<tr style="word-wrap: break-word;"><td style="border: 1px solid rgb(227, 237, 245); overflow: hidden; padding: 4px; word-wrap: break-word;">◆◆◆◆◆◆</td><td style="border: 1px solid rgb(227, 237, 245); overflow: hidden; padding: 4px; word-wrap: break-word;">◆◆◆◆</td></tr>
<tr style="word-wrap: break-word;"><td style="border: 1px solid rgb(227, 237, 245); overflow: hidden; padding: 4px; word-wrap: break-word;">◆◆◆◆◆◆</td><td style="border: 1px solid rgb(227, 237, 245); overflow: hidden; padding: 4px; word-wrap: break-word;">◆◆◆◆</td></tr>
</tbody></table>

Your .* is greedy so it matches everything until the last "> on your line. Here's what your first regex does: 你的.*是贪婪的,所以它匹配所有内容,直到最后一行">在你的行上。这是你的第一个正则表达式的作用:

https://regex101.com/r/qK5uY3/1 https://regex101.com/r/qK5uY3/1

Try: 尝试:

<([a-z]+) .*?=".*? *\/?>

I'd recommend looking at plugins for notepad++. 我建议看看notepad ++的插件。 There can be many issues using a regex to parse HTML. 使用正则表达式解析HTML可能存在许多问题。

https://regex101.com/r/qK5uY3/2 https://regex101.com/r/qK5uY3/2

The *\\/? *\\/? before the closing > is matching optional whitespace and a self closing element. 在结束之前>匹配可选空格和自闭合元素。 The \\h I prefer to use but I don't know if Notepad++ supports that (I'm mac'er). \\h我喜欢使用,但我不知道Notepad ++是否支持(我是mac'er)。

Update: 更新:

To capture the closing bit of the self closing element group the full closing part. 捕获自闭合元素组的闭合位完全闭合部分。

<([a-z]+) .*?=".*?( *\/?>)

then replace with the 2nd captured group. 然后替换为第二个捕获的组。

<\1$2

Demo: https://regex101.com/r/qK5uY3/3 演示: https//regex101.com/r/qK5uY3/3

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM