尝试在页面上查找与某个正则表达式匹配的所有元素

Question

I am working on a javascript bookmarklet that will go through and find all elements on a webpage with text that looks like a currency. 我正在研究一个javascript小书签，该小书签将遍历并查找网页上具有看起来像货币的文本的所有元素。 Then with each element, I find its font size and determine whether it has a line-through. 然后，对于每个元素，我找到其字体大小并确定它是否具有直行。 The price and font-size are pushed into an array. 价格和字体大小被推入数组。

I have put together the code below but I am not sure if it is the most efficient. 我把下面的代码放在一起，但是我不确定它是否最有效。 There is also an error with the match. 匹配也有错误。 Ideally, I would like to be able to hone in straight on those elements that match with the regex. 理想情况下，我希望能够直接对那些与正则表达式匹配的元素进行磨练。

var ele = b.getElementsByTagName('*');
for(i=0; i<ele.length; i++) {
    //check iff innerhtml matches
    if(ele[i].innerHTML.match(/[$€£]\d{1,3}(,?\d{3})?(\.\d{2})?/g)) {
        var price = ele[i].innerHTML;
        var size = ele[i].style.fontSize;
        var lineThrough = ele[i].style.textDecoration;
        if(lineThrough != 'line-through' && price && size) {
            results.push({ size: size, price: price});
        }
    }
}

For some reason, the match seems to not match exactly. 由于某些原因，匹配似乎不完全匹配。

Answer 1

First of all if you would like to match a sums grater than 999,999.99, regexp should be: [$€£](\\d{1,3})(,?\\d{3})*(\\.\\d{2})? 首先，如果您想匹配的总和大于999,999.99，则正则表达式应为： [$€£](\\d{1,3})(,?\\d{3})*(\\.\\d{2})? . 。 Here I changed ? 我在这里改变了? to * that means "0 or more", when "?" 到*表示“ 0或更大”，当“？” means "zero or one". 表示“零或一”。

If you want to find a price that is not written in a strict format (eg $30 000 000), than you may want to add admission for "possibly" spaces: [$€£]\\s*(\\d{1,3})\\s*(,?\\d{3}\\s*)*(\\.\\d{2})? 如果您要查找的价格不是严格的格式（例如$ 30 000000），则可能需要添加“可能”空格的入场券： [$€£]\\s*(\\d{1,3})\\s*(,?\\d{3}\\s*)*(\\.\\d{2})? . 。

Object style contains only the styles specified directly for this element, but does not contains inherited styles. 对象style仅包含为此元素直接指定的样式，但不包含继承的样式。 To get access to inherited styles, use window.getComputedStyle . 要访问继承的样式，请使用window.getComputedStyle 。

innerHTML property returns content of all nested nodes, so your function will find all parent elements of element you looking for. innerHTML属性返回所有嵌套节点的内容，因此您的函数将找到所需元素的所有父元素。 To find the current node text I use firstChild property if this property is instance of Text (but I'm believe there's a more elegant solution): 要查找当前节点的文本，请使用firstChild属性（如果该属性是instance of Text （但我相信有一个更优雅的解决方案）：

var ele = document.getElementsByTagName('*');
results = [];

for (i = 0; i < ele.length; i++)
{
    var el = ele[i];
    if (el.hasChildNodes && el.firstChild instanceof Text)
    {
        var price = el.firstChild.textContent.match(/([$€£]+)\s*(\d{1,3})\s*(,?\d{3}\s*)*(\.\d{2})?/g);
        if (price)
        {
            var style = window.getComputedStyle ? window.getComputedStyle(el) : el.style;
            var size = style.fontSize;
            var lineThrough = style.textDecoration;
            if (lineThrough != 'line-through' && price && size)
            {
                results.push({ size: size, price: price});
            }
        }
    }
}

If you want to find amounts that contains abbreviations, you can expand your regex to: /([$€£]+)\\s*(\\d{1,3})\\s*(,?\\d{3}\\s*)*(\\.\\d{2})?(\\s*[K|M|MM|B|thousand|million|billion])*/g . 如果要查找包含缩写的金额，则可以将正则表达式扩展为： /([$€£]+)\\s*(\\d{1,3})\\s*(,?\\d{3}\\s*)*(\\.\\d{2})?(\\s*[K|M|MM|B|thousand|million|billion])*/g 。

尝试在页面上查找与某个正则表达式匹配的所有元素

问题描述

1 个解决方案

解决方案1
1 已采纳 2013-04-26 06:26:33

尝试在页面上查找与某个正则表达式匹配的所有元素

问题描述

1 个解决方案

解决方案1 1 已采纳 2013-04-26 06:26:33

解决方案1
1 已采纳 2013-04-26 06:26:33