[英]How can you exclude the pseudo element ­ when collecting text using textContent?
I collect text from an HTML file using the textContent
method.我使用
textContent
方法从 HTML 文件中收集文本。 I beliefe that the pseudo element ­
我相信伪元素
­
is copied as well since I cannot replace words that contain this element.也被复制了,因为我无法替换包含此元素的单词。 All words that contain
­
所有包含
­
的单词(which is not visible) cannot be replaced with the actual word. (这是不可见的)不能用实际的词代替。 I tried to first replace
%shy;
我试着先替换
%shy;
using .replace((­/g, "")
but it will still not work.使用
.replace((­/g, "")
但它仍然不起作用。
Example:例子:
I cannot replace "efter­som"
using .replace(/eftersom/g, "???")
As said the element is not visible after collecting it with .textContent
, but it seems to be there.我无法使用
.replace(/eftersom/g, "???")
替换"efter­som"
如前所述,该元素在用.textContent
收集后不可见,但它似乎在那里。
I tried multiple regular expressions like:我尝试了多个正则表达式,例如:
.replace(new RegExp(`(\\W)(${firstWord.replace(/­/gi, "")})(\\W)`, "gi"), "$1???$3")
where firstWord
is a variable.其中
firstWord
是一个变量。
Try this out and see if it works - this should remove all the ­
试试这个,看看它是否有效——这应该会删除所有的
­
s on your page:在你的页面上:
console.log(document.body.innerHTML.replace(/\u00AD/g, ''));
This works by by searching for the Unicode character U+00AD.这通过搜索 Unicode 字符 U+00AD 来实现。
If the previous answer didn't work try using this one, which includes the ­ and the decimal version of the soft-hyphen (­).如果上一个答案不起作用,请尝试使用这个答案,其中包括 ­ 和软连字符 (­) 的十进制版本。
.replace(/(\­||­)/gi, "");
This have been answered before in this question.这个问题之前已经回答过了。 Remove ­
删除 ­ (soft hyphen) entity from element
(软连字符)来自元素的实体
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.