简体   繁体   English

删除不可打印的字符

[英]Removing non-printable character

Okay, so I've been bashing my head against the table over this one. 好吧,所以我一直在这上面砸头。

I am importing an XML file that was exported by Indesign. 我正在导入由Indesign导出的XML文件。 This parses it and creates a file based on the input. 这将对其进行解析,并根据输入内容创建一个文件。 (I'm building a JS application with Node) (我正在用Node构建一个JS应用程序)

This file looks good in my PHPStorm IDE. 该文件在我的PHPStorm IDE中看起来不错。 But when I open it in gedit, i see some unwanted newlines here and there. 但是,当我在gedit中打开它时,我到处都看到一些不需要的换行符。

I've managed to track it down to this character: -> <- (it really is there - copy it somewhere and move your cursor using the arrow keys over it. Its stuck in the middle). 我设法找到了这个字符: -> <- (确实存在-将其复制到某个位置,然后使用箭头键将其移动到光标处。光标停留在中间)。

This character viewed by a hex editor reveals it to be 0x80 0xE2 0xA9 十六进制编辑器查看的该字符显示为0x80 0xE2 0xA9

When I tried to replace it using a simple javascript replace; 当我尝试使用简单的javascript替换替换它时;

data = data.replace(' ', ''); //There IS a character in the left one. Trust me.

I got the following parse error; 我收到以下解析错误;

在此处输入图片说明

In vim it shows the following character at that place; 在vim中,该位置显示以下字符; ~@

How am I going to remove that from my output? 如何将其从输出中删除? Escaping the character in the JS code caused it to compile just fine, but then the weird character is still there. 在JS代码中转义该字符会导致它可以正常编译,但是奇怪的字符仍然存在。 I'm out of ideas. 我没主意了。

You need to use '\
' as the search string. 您需要使用“ \\ u2029”作为搜索字符串。 The sequence you are trying to replace is a "paragraph separator" Unicode character inserted by InDesign. 您要替换的序列是InDesign插入的“段落分隔符” Unicode字符。

So: 所以:

string.replace('\u2029', '');

instead of the character itself. 而不是角色本身。

String.replace() doesn't work exactly the way you think. String.replace()不能完全按照您的想法工作。 The way you use it, it'll only replace the first occurrence: 您使用它的方式只会代替第一次出现:

> "abc abc abc".replace("a", "x");
'xbc abc abc'

You need to add the g (global) flag and the only standard way is to use regular expression as match: 您需要添加g (全局)标志,唯一的标准方法是使用正则表达式作为匹配项:

> "abc abc abc".replace(/a/g, "x");
'xbc xbc xbc'

You can have a look at Fastest method to replace all instances of a character in a string for further ideas. 您可以看一下Fastest方法来替换字符串中字符的所有实例,以获取进一步的想法。


A search for 0x80 0xE2 0xA9 as UTF-8 shows the character doesn't exist but it's probably a mistype for 0xE2 0x80 0xA9 which corresponds to 'PARAGRAPH SEPARATOR' (U+2029) as Goran points out in his answer. 以UTF-8搜索0x80 0xE2 0xA9表示该字符不存在,但可能是0xE2 0x80 0xA9的错误键入 ,与Goran在他的答案中指出的“ PARAGRAPH SEPARATOR”(U + 2029)相对应。 You don't normally need to encode exotic characters as JavaScript \\u#### reference as long as all your tool-set is properly configured to use UTF-8 but, in this case, the JavaScript engine considers it a line feed and triggers a syntax error because you aren't allowed to have line feeds in JavaScript strings. 只要您将所有工具集都正确配置为使用UTF-8,通常就不需要将奇异字符编码为JavaScript \\ u ####引用,但是在这种情况下,JavaScript引擎会将其视为换行符,触发语法错误,因为不允许在JavaScript字符串中使用换行符。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 替换在工作笔记中不起作用的不可打印的 ASCII 字符 - Replacing non-printable ASCII character not working in work notes 不可打印字符的问题 - Issues with non-printable characters 在JavaScript中检测不可打印的字符 - Detect non-printable characters in JavaScript JavaScript中的对象属性是否可能不可打印但可迭代? - Is it possible for a object property in JavaScript to be non-printable but iterable? Internet Explorer中的Javascript函数上缺少不可打印的字符 - Non-printable characters missing on Javascript function in Internet Explorer 如何替换不可打印的 unicode 个字符 (Javascript) - How to replace non-printable unicode characters (Javascript) Javascript Regex 将文本字段限制为仅数字(必须允许不可打印的键) - Javascript Regex to limit Text Field to only Numbers (Must allow non-printable keys) 如何模拟适用于 JavaScript 中的字母数字和不可打印字符的跨浏览器按键? - How do I simulate a cross browser key press that works for alphanumeric and non-printable characters in JavaScript? 使用不可打印的字符(如CTRL,ALT或Shift键)镜像输入内容 - Mirroring input content with non-printable chars like CTRL, ALT or shift key 确定 JavaScript e.keyCode 是否为可打印(非控制)字符 - Determine if JavaScript e.keyCode is a printable (non-control) character
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM