简体   繁体   English

正则表达式以删除html标签和字符

[英]Regular express to remove html tags and characters

I need help with a regex which will remove all html tags and characters from the string. 我需要有关正则表达式的帮助,该正则表达式将从字符串中删除所有html标记和字符。

In my code shown below, I've oldStr : 在下面显示的代码中,我具有oldStr:

在此处输入图片说明

On click of the 'Replace' button, I'd want to change the oldStr to 'Hello' and remove '<', '>' and any sort of html tags as shown in my string. 在单击“替换”按钮时,我想将oldStr更改为“ Hello”,并删除“ <”,“>”和任何类型的html标记,如我的字符串所示。

How could I achieve that? 我该如何实现?

Here's my code: 这是我的代码:

 $(document).ready(function(){ var oldStr = '&lt;p&gt;&lt;a&gt;Hello&lt;/a&gt;&lt;/p&gt'; $('#old').text(oldStr); $('#replaceBtn').click(function(){ var newStr = mystr.replace('&lt;', ''); $('#new').text(newstr); }); }); 
 div{ height: 50px; width: 300px; border: 1px solid grey; border-radius: 5px; margin: 10px; padding: 10px; } 
 <script src="https://ajax.googleapis.com/ajax/libs/jquery/2.1.1/jquery.min.js"></script> <div id="old"></div> <br /> <div id="new"></div> <br /> <button id="replaceBtn">Replace</button> 

Please try this 请尝试这个

 var oldString = '<p><a>Hello</a></p>'; var newString = $(oldString).text(); 
 <script src="https://ajax.googleapis.com/ajax/libs/jquery/2.1.1/jquery.min.js"></script> 

Try the below code. 试试下面的代码。 https://jsfiddle.net/vineeshmp/do83rje2/ https://jsfiddle.net/vineeshmp/do83rje2/

$(document).ready(function(){ 
  var oldStr = '&lt;p&gt;&lt;a&gt;Hello&lt;/a&gt;&lt;/p&gt';
  $('#old').text(oldStr);
  $('#replaceBtn').click(function(){
    var newStr = $('<textarea />').html(oldStr).text();
    $('#new').text( $(newStr).text());
  });
});
var oldStr =  '<p><a>Hello</a></p>&gt';
var indexStart = oldStr.search("H");
var indexEnd = oldStr.search("o");
var newStr = oldStr.substring(indexStart,indexEnd +1);

I think this is what you are trying to do. 我认为这就是您想要做的。

Quick run down: 快速运行:

turns the &lt; 转动&lt; and &gt; &gt; characters into a much easier to read format. 字符转换为更容易阅读的格式。 We do this so that we can get rid of the tag's or any information in them much easier. 我们这样做是为了使我们可以更轻松地摆脱标签或标签中的任何信息。

I'm sure there's a faster way to replace both of them than chaining the method like I did, but I just woke up so this is what came out. 我敢肯定,比起像我那样链接方法,有一种更快的方法来替换这两个方法,但是我刚醒过来,这就是结果。

Next is the regex, which is sort of a mess but let me try to explain it: 接下来是正则表达式,有点混乱,但让我尝试解释一下:

"Find anything between the inequality symbols, but don't be greedy -- match the first one you find OR match anything of the format '&(some word characters);'" then we just replace anything that matches that with an empty string. “找到不等号之间的任何内容,但不要贪心-匹配找到的第一个符号, 匹配格式'&(some word character);'的任何内容”,然后我们将所有匹配的内容替换为空字符串。 What you are left with should just be a string containing what would be the text of the element. 剩下的应该只是一个包含元素文本的字符串。

For an alternative, you may just want to follow the first two replace calls and then just use Nimish's answer, which I feel would be much more reliable as jQuery's HTML parser is miles better than what I wrote. 作为替代方案,您可能只想遵循前两个replace调用,然后仅使用Nimish的答案,我认为这将更加可靠,因为jQuery的HTML解析器比我编写的要好得多。

 var oldStr = '&lt;p&gt;&lt;a&gt;Hello&lt;/a&gt;&lt;/p&gt;'; var newStr = oldStr.replace(/&lt;/g,'<').replace(/&gt;/g,'>').replace(/(<(.*?)>|&\\w+;)/g,''); alert(newStr); 
 <script src="https://ajax.googleapis.com/ajax/libs/jquery/2.1.1/jquery.min.js"></script> 

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM