简体   繁体   English

过滤 <form> 从HTML文本使用正则表达式

[英]filtering <form> from html text using regular expression

I am getting an whole html page from an ajax request as text ( xmlhttp.responseText ) 我从ajax请求中获取整个html页面作为文本( xmlhttp.responseText

Then filtering the text to extract a html form from that text and everything inside that form. 然后过滤文本提取HTML form从文本和表单内的一切。

I wrote an regex : 我写了一个正则表达式:

text.match(/(<form[\W\w]*<\/form>)/gim)

As i am not an expert in regex, so i cant be sure will it work in every scenario and get everything inside the form tag? 由于我不是正则表达式方面的专家,因此我无法确定它是否可以在每种情况下使用并将所有内容放入form标记内?

Is there a better way that i can say everything in regex? 有没有更好的方法可以让我在正则表达式中说出一切 so that the regex will look like 这样正则表达式看起来像

 text.match(/(<form[__everything_syntaxt_here__]*<\/form>)/gim)

Try this: 尝试这个:

function stripForm(s) {
  var div = document.createElement('div');
  div.innerHTML = s;
  var scripts = div.getElementsByTagName('form');
  var i = scripts.length;
  while (i--) {
    scripts[i].parentNode.removeChild(scripts[i]);
  }
  return div.innerHTML;
}
function getForm(s) {
  var div = document.createElement('div');
  div.innerHTML = s;
  var scripts = div.getElementsByTagName('form');
  var i = scripts.length;
    var ret="";
  while (i--) {
    ret += scripts[i].innerHTML;
  }
  return ret;
}
var a = 'before Form <form action="" method="post"> <input type="text" /> <input type="text" /> <input type="text" /> </form><br/> after form';
alert(getForm(a));
alert(stripForm(a));
console.log(stripForm(a));

Demo 演示版

Having to deal with IE 5 , you poor soul. 不得不处理IE 5 ,您可怜的灵魂。

A quick answer to your question Is [\\W\\w] really the best way to match absolutely everything? 您问题的快速答案[\\W\\w]确实是匹配所有事物的最佳方法吗?

Yes , JavaScript does not support the s modifier to make . 是的 ,JavaScript不支持使用s修饰符. match newlines. 匹配换行符。 Doing [\\W\\w] basically tells the regex: "Match anything that is a word character, or anything that isn't a word character" , you can see that absolutely every character falls in either of those categories. 进行[\\W\\w]基本上会告诉正则表达式: “匹配任何单词字符或非单词字符” ,您可以看到绝对每个字符都属于这两个类别。

But , if you want a more reliable solution to deal with <!-- html comments --> and multiple forms on a page, best approach is something like explained in this SO answer but changed for HTML. 但是 ,如果您想使用更可靠的解决方案来处理<!-- html comments -->以及页面上的多种形式,则最佳方法类似于此SO答案中所述,但已更改为HTML。

This is what I would use: 这就是我要使用的:

<!--(?:(?!-->)[\w\W])*-->|(<form(?:(?:(?!<\/form>|<!--)[\w\W])|(?:<!--(?:(?!-->)[\w\W])*-->))*</form>)

正则表达式可视化

Look at the Debuggex Demo to see what matches you actually get. 查看Debuggex演示以查看您实际得到的匹配。 In JavaScript you can then expect the first capture group. 然后,您可以在JavaScript中获得第一个捕获组。 If it's empty then that was just to get rid of the commented form like explained here . 如果它是空的,那只是为了摆脱这里解释的注释形式。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM