[英]Trying to find all elements on a page that matches a certain regex
I am working on a javascript bookmarklet that will go through and find all elements on a webpage with text that looks like a currency. 我正在研究一个javascript小书签,该小书签将遍历并查找网页上具有看起来像货币的文本的所有元素。 Then with each element, I find its font size and determine whether it has a line-through.
然后,对于每个元素,我找到其字体大小并确定它是否具有直行。 The price and font-size are pushed into an array.
价格和字体大小被推入数组。
I have put together the code below but I am not sure if it is the most efficient. 我把下面的代码放在一起,但是我不确定它是否最有效。 There is also an error with the match.
匹配也有错误。 Ideally, I would like to be able to hone in straight on those elements that match with the regex.
理想情况下,我希望能够直接对那些与正则表达式匹配的元素进行磨练。
var ele = b.getElementsByTagName('*');
for(i=0; i<ele.length; i++) {
//check iff innerhtml matches
if(ele[i].innerHTML.match(/[$€£]\d{1,3}(,?\d{3})?(\.\d{2})?/g)) {
var price = ele[i].innerHTML;
var size = ele[i].style.fontSize;
var lineThrough = ele[i].style.textDecoration;
if(lineThrough != 'line-through' && price && size) {
results.push({ size: size, price: price});
}
}
}
For some reason, the match seems to not match exactly. 由于某些原因,匹配似乎不完全匹配。
First of all if you would like to match a sums grater than 999,999.99, regexp should be: [$€£](\\d{1,3})(,?\\d{3})*(\\.\\d{2})?
首先,如果您想匹配的总和大于999,999.99,则正则表达式应为:
[$€£](\\d{1,3})(,?\\d{3})*(\\.\\d{2})?
. 。 Here I changed
?
我在这里改变了
?
to *
that means "0 or more", when "?" 到
*
表示“ 0或更大”,当“?” means "zero or one". 表示“零或一”。
If you want to find a price that is not written in a strict format (eg $30 000 000), than you may want to add admission for "possibly" spaces: [$€£]\\s*(\\d{1,3})\\s*(,?\\d{3}\\s*)*(\\.\\d{2})?
如果您要查找的价格不是严格的格式(例如$ 30 000000),则可能需要添加“可能”空格的入场券:
[$€£]\\s*(\\d{1,3})\\s*(,?\\d{3}\\s*)*(\\.\\d{2})?
. 。
Object style
contains only the styles specified directly for this element, but does not contains inherited styles. 对象
style
仅包含为此元素直接指定的样式,但不包含继承的样式。 To get access to inherited styles, use window.getComputedStyle
. 要访问继承的样式,请使用
window.getComputedStyle
。
innerHTML
property returns content of all nested nodes, so your function will find all parent elements of element you looking for. innerHTML
属性返回所有嵌套节点的内容,因此您的函数将找到所需元素的所有父元素。 To find the current node text I use firstChild
property if this property is instance of Text
(but I'm believe there's a more elegant solution): 要查找当前节点的文本,请使用
firstChild
属性(如果该属性是instance of Text
(但我相信有一个更优雅的解决方案):
var ele = document.getElementsByTagName('*');
results = [];
for (i = 0; i < ele.length; i++)
{
var el = ele[i];
if (el.hasChildNodes && el.firstChild instanceof Text)
{
var price = el.firstChild.textContent.match(/([$€£]+)\s*(\d{1,3})\s*(,?\d{3}\s*)*(\.\d{2})?/g);
if (price)
{
var style = window.getComputedStyle ? window.getComputedStyle(el) : el.style;
var size = style.fontSize;
var lineThrough = style.textDecoration;
if (lineThrough != 'line-through' && price && size)
{
results.push({ size: size, price: price});
}
}
}
}
If you want to find amounts that contains abbreviations, you can expand your regex to: /([$€£]+)\\s*(\\d{1,3})\\s*(,?\\d{3}\\s*)*(\\.\\d{2})?(\\s*[K|M|MM|B|thousand|million|billion])*/g
. 如果要查找包含缩写的金额,则可以将正则表达式扩展为:
/([$€£]+)\\s*(\\d{1,3})\\s*(,?\\d{3}\\s*)*(\\.\\d{2})?(\\s*[K|M|MM|B|thousand|million|billion])*/g
。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.