简体   繁体   English

如何使用正则表达式和JavaScript进行过滤?

[英]How to filter using Regex and javascript?

I have some text in an element in my page, and i want to scrap the price on that page without any text beside. 我的页面中的某个元素中有一些文本,我想取消该页面上的价格而没有任何文本。

I found the page contain price like that: 我发现页面包含这样的价格:

<span class="discount">now $39.99</span>

How to filter this and just get "$39.99" just using JavaScript and regular expressions. 如何过滤此内容,仅使用JavaScript和正则表达式即可获得“ $ 39.99”。

The question may be too easy or asked by another way before but i know nothing about regular expressions so asked for your help :). 这个问题可能太容易了,或者以前用另一种方式问过,但是我对正则表达式一无所知,所以请您帮忙:)。

<script language="javascript">
window.onload = function () {

    // Get all of the elements with class name "discount"
    var elements = document.getElementsByClassName('discount');

    // Loop over each <span class="discount">
    for (var i=0; i < elements.length; i++) {

         // get the text, e.g. "now $39.99"
         var rawText = elements[i].innerHTML;

         // Here's a regular expression to match one or more digits (\d+)
         // followed by a period (\.) and one or more digits again (\d+)
         var priceAsString = rawText.match(/\d+\.\d+/)

         // You'll want to make the price a floating point number if you 
         // intend to do any calculations with it.
         var price = parseFloat(priceAsString); 

         // Now what do you want to do with the price? I'll just write it out
         // to the console (using FireBug or something similar)
         console.log(price);

    }
}
</script>
document.evaluate("//span[@class='discount']", 
  document, 
  null, 
  XPathResult.ANY_UNORDERED_NODE_TYPE, 
  null).singleNodeValue.textContent.replace("now $", "");

EDIT: This is standard XPath . 编辑:这是标准的XPath I'm not sure what kind of explanation you're seeking. 我不确定您要寻求哪种解释。 For outdated browsers, you will need a third-party library like Sarissa and/or Java-line . 对于过时的浏览器,您将需要第三方库,例如Sarissa和/或Java-line

Regexes are fundamentally bad at parsing HTML (see Can you provide some examples of why it is hard to parse XML and HTML with a regex? for why). 正则表达式从根本上不利于解析HTML(请参阅您能否提供一些示例,以了解为什么很难用正则表达式来解析XML和HTML? )。 What you need is an HTML parser. 您需要一个HTML解析器。 See Can you provide an example of parsing HTML with your favorite parser? 请参见您能否提供一个使用您喜欢的解析器解析HTML的示例? for examples using a variety of parsers. 例如使用各种解析器的示例。

Patrick McElhaney's and Matthew Flaschen's answers are both good ways to solve the problem. 帕特里克·麦克埃尔哈尼(Patrick McElhaney)和马修·弗拉申(Matthew Flaschen)的答案都是解决问题的好方法。

as Matthew Flaschen suggested , XPATH is a better way to go, if you know something about the node structure of the target document (and since you provided an example, you seem to). 正如Matthew Flaschen所建议的那样,如果您对目标文档的节点结构有所了解(并且由于提供了示例,您似乎也可以),那么XPATH是更好的选择。 If you don't know the node structure, regexes are still lousy for parsing XML. 如果您不知道节点结构,则正则表达式对于解析XML仍然很糟糕。

some more resources to kick-start you: 一些更多资源来启动您:

I've also found the FireFox extension combo of DOM Inspector and XPather to be an invaluable tool for deriving and testing XPath expressions on a given page. 我还发现DOM InspectorXPather的FireFox扩展组合是在给定页面上派生和测试XPath表达式的宝贵工具。 (If you're using another browser -- well, I don't know). (如果您使用的是其他浏览器,那么我不知道)。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM