简体   繁体   English

无法使用javascript替换特殊的字符组合

[英]Can not replace a special combination of characters using javascript

I want to remove all ‍ 我想删除所有‍ which have character "ا" after from a paragraph. 从段落后面有字符“ا”。 I use the following method but console says that this combination is not found. 我使用以下方法,但控制台说没有找到这种组合。 Please consider that this is Persian word and the character "ا" is instantly after ‍ 请考虑这是波斯语单词,字符“ا” ‍‍ as the characters are written Right to Left an the tail before character "ا" proves that they are connected together. 当字符从右到左书写时,字符“ا”之前的尾部证明它们连在一起。

 $(document).ready(function(){ var htm=$("div").html(); var shouldRemove="‍ا"; if (htm.includes(shouldRemove)){ console.log('found'); } else{ console.log('not found'); } }) 
 body{font-size:26pt} 
 <script src="https://cdnjs.cloudflare.com/ajax/libs/jquery/3.3.1/jquery.min.js"></script> <div>&zwj;احترام</div> 

A way to do this would be to convert the &zwj;ا into text, using the method below, and getting the text of the div instead of its html, then comparing the two texts: 一种方法是使用下面的方法将&zwj;ا转换为文本,并获取div的文本而不是其html,然后比较两个文本:

 $(document).ready(function(){ // get the text var div_txt = $("div").text(); var shouldRemove = "&zwj;ا"; // put it as html in a span, then get it as text var rem_txt = $("<span>").html(shouldRemove).text(); if (div_txt.includes(rem_txt)) { console.log('found'); } else { console.log('not found'); } }) 
 body { font-size:26pt } 
 <script src="https://cdnjs.cloudflare.com/ajax/libs/jquery/3.3.1/jquery.min.js"></script> <div>&zwj;احترام</div> 

The issue here is that when reading an HTML entity from the DOM, the entity gets parsed, so the character sequence &zwj; 这里的问题是,当从DOM读取HTML实体时,实体会被解析,因此字符序列&zwj; turns into the single character ZERO WIDTH JOINER. 变成单个字符ZERO WIDTH JOINER。

Your approach would work if you were to run JavaScript on the command line: 如果要在命令行上运行JavaScript,您的方法将起作用:

$ node
> s = '<div>&zwj;احترام</div>'
'<div>&zwj;احترام</div>'
> s.includes("&zwj;ا")
true

Even in a browser, if you use the JavaScript console directly, things work fine as you expected them to: 即使在浏览器中,如果您直接使用JavaScript控制台,也可以按照您的预期正常工作:

浏览器中JS控制台的屏幕截图

So what's different about reading from the DOM (in your case, with jQuery)? 那么从DOM读取(在您的情况下,使用jQuery)有什么不同? To see what is happening, let's check the actual characters within the string: 要查看发生了什么,让我们检查字符串中的实际字符:

 $(document).ready(function(){ var htm=$("div").text(); console.log(Array.from(htm)); console.log(Array.from("&zwj;ا")); }) 
 <script src="https://cdnjs.cloudflare.com/ajax/libs/jquery/3.3.1/jquery.min.js"></script> <div>&zwj;احترام</div> 

This gives: 这给出了:

另一个上限

Aha, so jQuery is parsing the HTML entity! 啊哈,所以jQuery正在解析HTML实体! Because of this, the text you want to search for should have the JavaScript zwj, not the HTML one. 因此,您要搜索的文本应该包含JavaScript zwj,而不是HTML。 Specify it like this: 像这样指定:

 $(document).ready(function(){ var htm=$("div").html(); var shouldRemove="\\u{200d}ا"; if (htm.includes(shouldRemove)){ console.log('found'); } else{ console.log('not found'); } }) 
 body{font-size:26pt} 
 <script src="https://cdnjs.cloudflare.com/ajax/libs/jquery/3.3.1/jquery.min.js"></script> <div>&zwj;احترام</div> 

Now it outputs true . 现在输出为true

So all is well with RTL (text direction)! 所以RTL(文本方向)都很好! Turns out it's all just a matter of when HTML entities get parsed. 事实证明,这只是HTML实体何时被解析的问题。 :) :)

Instead of searching the entity &zwj; 而不是搜索实体&zwj; inside the HTML, search the character itself ( code point +U200D ) in the text value of the div node (not its HTML): 在HTML内部,在div节点的文本值(而不是HTML)中搜索字符本身( 代码点+ U200D ):

 console.log("Found?", $("div").text().includes("\‍ا")); 
 <script src="https://cdnjs.cloudflare.com/ajax/libs/jquery/3.3.1/jquery.min.js"></script> <div>&zwj;احترام</div> 

If you console log your htm variable it will o/p as: ‍احترام, so that's why when you try to search it with "‍ا", it outputs as 'not found'. 如果您在控制台上记录您的htm变量,它将o / p为:احترام,这就是为什么当您尝试使用“ا”搜索它时,它会输出为“未找到”。 Please try the below approach : 请尝试以下方法:

 $(document).ready(function(){ var htm=$("div").html(); var shouldRemove="ا"; if (htm.includes(shouldRemove)){ console.log('found'); } else{ console.log('not found'); } }) 
 body{font-size:26pt} 
 <script src="https://cdnjs.cloudflare.com/ajax/libs/jquery/3.3.1/jquery.min.js"></script> <div>&zwj;احترام</div> 

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM