如何使用正则表达式和JavaScript进行过滤？

Question

I have some text in an element in my page, and i want to scrap the price on that page without any text beside. 我的页面中的某个元素中有一些文本，我想取消该页面上的价格而没有任何文本。

I found the page contain price like that: 我发现页面包含这样的价格：

<span class="discount">now $39.99</span>

How to filter this and just get "$39.99" just using JavaScript and regular expressions. 如何过滤此内容，仅使用JavaScript和正则表达式即可获得“ $ 39.99”。

The question may be too easy or asked by another way before but i know nothing about regular expressions so asked for your help :). 这个问题可能太容易了，或者以前用另一种方式问过，但是我对正则表达式一无所知，所以请您帮忙:)。

Answer 1

<script language="javascript">
window.onload = function () {

    // Get all of the elements with class name "discount"
    var elements = document.getElementsByClassName('discount');

    // Loop over each <span class="discount">
    for (var i=0; i < elements.length; i++) {

         // get the text, e.g. "now $39.99"
         var rawText = elements[i].innerHTML;

         // Here's a regular expression to match one or more digits (\d+)
         // followed by a period (\.) and one or more digits again (\d+)
         var priceAsString = rawText.match(/\d+\.\d+/)

         // You'll want to make the price a floating point number if you 
         // intend to do any calculations with it.
         var price = parseFloat(priceAsString); 

         // Now what do you want to do with the price? I'll just write it out
         // to the console (using FireBug or something similar)
         console.log(price);

    }
}
</script>

Answer 2

document.evaluate("//span[@class='discount']", 
  document, 
  null, 
  XPathResult.ANY_UNORDERED_NODE_TYPE, 
  null).singleNodeValue.textContent.replace("now $", "");

EDIT: This is standard XPath . 编辑：这是标准的XPath 。 I'm not sure what kind of explanation you're seeking. 我不确定您要寻求哪种解释。 For outdated browsers, you will need a third-party library like Sarissa and/or Java-line . 对于过时的浏览器，您将需要第三方库，例如Sarissa和/或Java-line 。

Answer 3

Regexes are fundamentally bad at parsing HTML (see Can you provide some examples of why it is hard to parse XML and HTML with a regex? for why). 正则表达式从根本上不利于解析HTML（请参阅您能否提供一些示例，以了解为什么很难用正则表达式来解析XML和HTML？）。 What you need is an HTML parser. 您需要一个HTML解析器。 See Can you provide an example of parsing HTML with your favorite parser? 请参见您能否提供一个使用您喜欢的解析器解析HTML的示例？ for examples using a variety of parsers. 例如使用各种解析器的示例。

Patrick McElhaney's and Matthew Flaschen's answers are both good ways to solve the problem. 帕特里克·麦克埃尔哈尼（Patrick McElhaney）和马修·弗拉申（Matthew Flaschen）的答案都是解决问题的好方法。

Answer 4

as Matthew Flaschen suggested , XPATH is a better way to go, if you know something about the node structure of the target document (and since you provided an example, you seem to). 正如Matthew Flaschen所建议的那样，如果您对目标文档的节点结构有所了解（并且由于提供了示例，您似乎也可以），那么XPATH是更好的选择。 If you don't know the node structure, regexes are still lousy for parsing XML. 如果您不知道节点结构，则正则表达式对于解析XML仍然很糟糕。

some more resources to kick-start you: 一些更多资源来启动您：

XPath in Javascript: Introduction Javascript中的XPath：简介
DOM Parsing With XPath and JavaScript 使用XPath和JavaScript进行DOM解析
Mozilla dev-center: Introduction to using XPath in JavaScript Mozilla开发中心：在JavaScript中使用XPath的简介

I've also found the FireFox extension combo of DOM Inspector and XPather to be an invaluable tool for deriving and testing XPath expressions on a given page. 我还发现DOM Inspector和XPather的FireFox扩展组合是在给定页面上派生和测试XPath表达式的宝贵工具。 (If you're using another browser -- well, I don't know). （如果您使用的是其他浏览器，那么我不知道）。

如何使用正则表达式和JavaScript进行过滤？

问题描述

4 个解决方案

解决方案1
4 已采纳 2009-05-25 14:41:42

解决方案2
3 2009-05-25 14:36:25

解决方案3
1 2009-05-25 14:52:50

解决方案4
0 2009-05-26 14:00:11

如何使用正则表达式和JavaScript进行过滤？

问题描述

4 个解决方案

解决方案1 4 已采纳 2009-05-25 14:41:42

解决方案2 3 2009-05-25 14:36:25

解决方案3 1 2009-05-25 14:52:50

解决方案4 0 2009-05-26 14:00:11

解决方案1
4 已采纳 2009-05-25 14:41:42

解决方案2
3 2009-05-25 14:36:25

解决方案3
1 2009-05-25 14:52:50

解决方案4
0 2009-05-26 14:00:11