简体   繁体   English

尝试在页面上查找与某个正则表达式匹配的所有元素

[英]Trying to find all elements on a page that matches a certain regex

I am working on a javascript bookmarklet that will go through and find all elements on a webpage with text that looks like a currency. 我正在研究一个javascript小书签,该小书签将遍历并查找网页上具有看起来像货币的文本的所有元素。 Then with each element, I find its font size and determine whether it has a line-through. 然后,对于每个元素,我找到其字体大小并确定它是否具有直行。 The price and font-size are pushed into an array. 价格和字体大小被推入数组。

I have put together the code below but I am not sure if it is the most efficient. 我把下面的代码放在一起,但是我不确定它是否最有效。 There is also an error with the match. 匹配也有错误。 Ideally, I would like to be able to hone in straight on those elements that match with the regex. 理想情况下,我希望能够直接对那些与正则表达式匹配的元素进行磨练。

var ele = b.getElementsByTagName('*');
for(i=0; i<ele.length; i++) {
    //check iff innerhtml matches
    if(ele[i].innerHTML.match(/[$€£]\d{1,3}(,?\d{3})?(\.\d{2})?/g)) {
        var price = ele[i].innerHTML;
        var size = ele[i].style.fontSize;
        var lineThrough = ele[i].style.textDecoration;
        if(lineThrough != 'line-through' && price && size) {
            results.push({ size: size, price: price});
        }
    }
}

For some reason, the match seems to not match exactly. 由于某些原因,匹配似乎不完全匹配。

First of all if you would like to match a sums grater than 999,999.99, regexp should be: [$€£](\\d{1,3})(,?\\d{3})*(\\.\\d{2})? 首先,如果您想匹配的总和大于999,999.99,则正则表达式应为: [$€£](\\d{1,3})(,?\\d{3})*(\\.\\d{2})? . Here I changed ? 我在这里改变了? to * that means "0 or more", when "?" *表示“ 0或更大”,当“?” means "zero or one". 表示“零或一”。

If you want to find a price that is not written in a strict format (eg $30 000 000), than you may want to add admission for "possibly" spaces: [$€£]\\s*(\\d{1,3})\\s*(,?\\d{3}\\s*)*(\\.\\d{2})? 如果您要查找的价格不是严格的格式(例如$ 30 000000),则可能需要添加“可能”空格的入场券: [$€£]\\s*(\\d{1,3})\\s*(,?\\d{3}\\s*)*(\\.\\d{2})? .

Object style contains only the styles specified directly for this element, but does not contains inherited styles. 对象style仅包含为此元素直接指定的样式,但不包含继承的样式。 To get access to inherited styles, use window.getComputedStyle . 要访问继承的样式,请使用window.getComputedStyle

innerHTML property returns content of all nested nodes, so your function will find all parent elements of element you looking for. innerHTML属性返回所有嵌套节点的内容,因此您的函数将找到所需元素的所有父元素。 To find the current node text I use firstChild property if this property is instance of Text (but I'm believe there's a more elegant solution): 要查找当前节点的文本,请使用firstChild属性(如果该属性是instance of Text (但我相信有一个更优雅的解决方案):

var ele = document.getElementsByTagName('*');
results = [];

for (i = 0; i < ele.length; i++)
{
    var el = ele[i];
    if (el.hasChildNodes && el.firstChild instanceof Text)
    {
        var price = el.firstChild.textContent.match(/([$€£]+)\s*(\d{1,3})\s*(,?\d{3}\s*)*(\.\d{2})?/g);
        if (price)
        {
            var style = window.getComputedStyle ? window.getComputedStyle(el) : el.style;
            var size = style.fontSize;
            var lineThrough = style.textDecoration;
            if (lineThrough != 'line-through' && price && size)
            {
                results.push({ size: size, price: price});
            }
        }
    }
}

If you want to find amounts that contains abbreviations, you can expand your regex to: /([$€£]+)\\s*(\\d{1,3})\\s*(,?\\d{3}\\s*)*(\\.\\d{2})?(\\s*[K|M|MM|B|thousand|million|billion])*/g . 如果要查找包含缩写的金额,则可以将正则表达式扩展为: /([$€£]+)\\s*(\\d{1,3})\\s*(,?\\d{3}\\s*)*(\\.\\d{2})?(\\s*[K|M|MM|B|thousand|million|billion])*/g

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM