[英]How to filter using Regex and javascript?
I have some text in an element in my page, and i want to scrap the price on that page without any text beside. 我的页面中的某个元素中有一些文本,我想取消该页面上的价格而没有任何文本。
I found the page contain price like that: 我发现页面包含这样的价格:
<span class="discount">now $39.99</span>
How to filter this and just get "$39.99" just using JavaScript and regular expressions. 如何过滤此内容,仅使用JavaScript和正则表达式即可获得“ $ 39.99”。
The question may be too easy or asked by another way before but i know nothing about regular expressions so asked for your help :). 这个问题可能太容易了,或者以前用另一种方式问过,但是我对正则表达式一无所知,所以请您帮忙:)。
<script language="javascript">
window.onload = function () {
// Get all of the elements with class name "discount"
var elements = document.getElementsByClassName('discount');
// Loop over each <span class="discount">
for (var i=0; i < elements.length; i++) {
// get the text, e.g. "now $39.99"
var rawText = elements[i].innerHTML;
// Here's a regular expression to match one or more digits (\d+)
// followed by a period (\.) and one or more digits again (\d+)
var priceAsString = rawText.match(/\d+\.\d+/)
// You'll want to make the price a floating point number if you
// intend to do any calculations with it.
var price = parseFloat(priceAsString);
// Now what do you want to do with the price? I'll just write it out
// to the console (using FireBug or something similar)
console.log(price);
}
}
</script>
document.evaluate("//span[@class='discount']",
document,
null,
XPathResult.ANY_UNORDERED_NODE_TYPE,
null).singleNodeValue.textContent.replace("now $", "");
EDIT: This is standard XPath . 编辑:这是标准的XPath 。 I'm not sure what kind of explanation you're seeking.
我不确定您要寻求哪种解释。 For outdated browsers, you will need a third-party library like Sarissa and/or Java-line .
对于过时的浏览器,您将需要第三方库,例如Sarissa和/或Java-line 。
Regexes are fundamentally bad at parsing HTML (see Can you provide some examples of why it is hard to parse XML and HTML with a regex? for why). 正则表达式从根本上不利于解析HTML(请参阅您能否提供一些示例,以了解为什么很难用正则表达式来解析XML和HTML? )。 What you need is an HTML parser.
您需要一个HTML解析器。 See Can you provide an example of parsing HTML with your favorite parser?
请参见您能否提供一个使用您喜欢的解析器解析HTML的示例? for examples using a variety of parsers.
例如使用各种解析器的示例。
Patrick McElhaney's and Matthew Flaschen's answers are both good ways to solve the problem. 帕特里克·麦克埃尔哈尼(Patrick McElhaney)和马修·弗拉申(Matthew Flaschen)的答案都是解决问题的好方法。
as Matthew Flaschen suggested , XPATH is a better way to go, if you know something about the node structure of the target document (and since you provided an example, you seem to). 正如Matthew Flaschen所建议的那样,如果您对目标文档的节点结构有所了解(并且由于提供了示例,您似乎也可以),那么XPATH是更好的选择。 If you don't know the node structure, regexes are still lousy for parsing XML.
如果您不知道节点结构,则正则表达式对于解析XML仍然很糟糕。
some more resources to kick-start you: 一些更多资源来启动您:
I've also found the FireFox extension combo of DOM Inspector and XPather to be an invaluable tool for deriving and testing XPath expressions on a given page. 我还发现DOM Inspector和XPather的FireFox扩展组合是在给定页面上派生和测试XPath表达式的宝贵工具。 (If you're using another browser -- well, I don't know).
(如果您使用的是其他浏览器,那么我不知道)。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.