简体   繁体   English

Javascript正则表达式没有按预期工作

[英]Javascript regex not working as intended

I have the HTML from a page in a variable as just plain text. 我将变量中的页面中的HTML作为纯文本。 Now I need to remove some parts of the text. 现在我需要删除文本的某些部分。 This is a part of the HTML that I need to change: 这是我需要更改的HTML的一部分:

<div class="post"><a name="6188729"></a>
    <div class="igmline small" style="height: 20px; padding-top: 1px;">
        <span class="postheader_left">
            <a href="#"  style="font-size:9pt;"> RuneRifle </a>
            op 24.08.2012 om 21:41 uur
        </span>
        <span class="postheader_right">
            <a href="http://link">Citaat</a> <a href="http://link">Bewerken</a>
        </span>
        <div style="clear:both;"></div>
    </div>
    <div class="text">Testforum</div>
    <!-- Begin Thank -->
    <!-- Thank End -->
</div>

These replaces work: 这些替换工作:

pageData = pageData.replace(/href=\".*?\"/g, "href=\"#\"");
pageData = pageData.replace(/target=\".*?\"/g, "");

But this replace does not work at all: 但这个替换根本不起作用:

pageData = pageData.replace(
  /<span class=\"postheader_right\">(.*?)<\/span>/g, "");

I need to remove every span with the class postheader_right and everything in it, but it just doesn't work. 我需要使用postheader_right类及其中的所有内容删除每个span ,但它只是不起作用。 My knowledge of regex isn't that great so I'd appreciate if you would tell me how you came to your answer and a small explanation of how it works. 我对正则表达式的了解并不是那么好所以如果你能告诉我你是如何得到你的答案以及它是如何工作的一个小解释我会很感激。

The dot doesn't match newlines. 该点与换行符不匹配。 Use [\\s\\S] instead of the dot as it will match all whitespace characters or non-whitespace characters (ie, anything). 使用[\\s\\S]而不是点,因为它将匹配所有空白字符或非空白字符(即任何东西)。

As Mike Samuel says regular expressions are not really the best way to go given the complexity allowed in HTML (eg, if say there is a line break after <a ), especially if you have to look for attributes which may occur in different orders, but that's the way you can do it to match the case in your example HTML. 正如Mike Samuel所说,正则表达式并不是最好的方法,因为HTML中允许的复杂性(例如,如果在<a之后有一个换行符),特别是如果你必须寻找可能出现在不同顺序中的属性,但这就是你可以用它来匹配你的示例HTML中的情况的方式。

I need to remove every span with the class postheader_right and everything in it, but it just doesn't work. 我需要使用postheader_right类及其中的所有内容删除每个span,但它只是不起作用。

Don't use regular expressions to find the spans. 不要使用正则表达式来查找跨度。 Using regular expressions to parse HTML: why not? 使用正则表达式解析HTML:为什么不呢?

var allSpans = document.getElementsByClassName('span');
for (var i = allSpans.length; --i >= 0;) {
  var span = allSpans[i];
  if (/\bpostheader_right\b/.test(span.className)) {
    span.parentNode.removeChild(span);
  }
}

should do it. 应该这样做。

If you only need to work on newer browsers then getElementsByClassName makes it even easier: 如果您只需要在更新的浏览器上工作,那么getElementsByClassName可以让它变得更加容易:

Find all div elements that have a class of 'test' 找到所有具有'test'类的div元素

 var tests = Array.filter( document.getElementsByClassName('test'), function(elem){ return elem.nodeName == 'DIV'; }); 

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM