简体   繁体   English

在 logfile.log 中检测未知字符�

[英]detect unknown character � in logfile.log

does anyone know what the � character means and what character format it comes in?有谁知道 � 字符的含义以及它的字符格式是什么?

I sometimes have this character in my logfile.log (ASCII format) file.有时我的 logfile.log(ASCII 格式)文件中有这个字符。

For the benefit of search engines, the symbol we are talking about here (�) is a question-mark inside a rhombus.为了搜索引擎的利益,我们在这里谈论的符号 (�) 是一个菱形内的问号。 (And a rhombus is colloquially known as a diamond.) (菱形俗称菱形。)

� is a symbol, or perhaps a font glyph, but strictly speaking, not a character. � 是一个符号,或者可能是字体字形,但严格来说,不是字符。 It is a general-purpose symbol which is used by many fonts to indicate that they do not actually have a font glyph for a certain character that you are trying to print.它是一个通用符号,许多 fonts 使用它来表示它们实际上没有您要打印的某个字符的字体字形。 The font just maps to that font glyph all characters for which it does not have a glyph, so that we humans can take notice that something is wrong and in need of troubleshooting.该字体只是将其没有字形的所有字符映射到该字体字形,以便我们人类可以注意到出现了问题并需要进行故障排除。

So, bottom line is, by just looking at that � we cannot tell which character it is.所以,最重要的是,仅仅通过观察我们无法判断它是哪个字符。

However, we can find out more by looking deeper into the text file.但是,我们可以通过更深入地查看文本文件来了解更多信息。

The proper way of doing this is to obtain a hex dump of that text file.这样做的正确方法是获取该文本文件的十六进制转储。 In the good old days, every programmer had a hex dump utility lying around somewhere, but nowadays with everyone fumbling data in text format, hex dump utilities have fallen out of grace, so you might not have one.在过去的好日子里,每个程序员的某个地方都有一个十六进制转储实用程序,但现在每个人都在摸索文本格式的数据,十六进制转储实用程序已经失宠,所以你可能没有一个。 Luckily, the Great Interwebz can come to the rescue:幸运的是,伟大的 Interwebz 可以拯救:

Copy the character you are interested in, and paste it into a tool that converts text to hexadecimal, for example something like https://online-toolz.com/tools/text-hex-convertor.php (ignore the bad English, and fact that it is written in PHP; programmers come from all over the world, and programmers do make bad life choices.)复制您感兴趣的字符,并将其粘贴到将文本转换为十六进制的工具中,例如https://online-toolz.com/tools/text-hex-convertor.php (忽略糟糕的英文,和事实上,它写在 PHP;程序员来自世界各地,程序员确实做出了错误的人生选择。)

For the particular character that you included in this post, the text-to-hex converter gives efbfbd .对于您在本文中包含的特定字符,文本到十六进制转换器会给出efbfbd This bit pattern makes no sense, and it probably means that some corrupt data or uninitialized memory was written into your log file.这个位模式没有意义,它可能意味着一些损坏的数据或未初始化的 memory 已写入您的日志文件。

Also do note that you might be thinking of your log file as being ASCII-encoded, but it may not necessarily be so;另请注意,您可能认为日志文件是 ASCII 编码的,但不一定如此; it may have a UTF-8 encoding, which is identical to ASCII for the first 126 characters, which are the most commonly used ones.它可能有一个 UTF-8 编码,前 126 个字符与 ASCII 相同,这是最常用的字符。 At any rate, the utility that you are using to view your log file is in all likelihood unicode-aware, so it is probably interpreting your log-file as UTF-8, but this does not make any difference because efbfbd is nonsense either in UTF-8 or in ASCII.无论如何,您用来查看日志文件的实用程序很可能是 unicode 感知的,因此它可能将您的日志文件解释为 UTF-8,但这没有任何区别,因为efbfbd在 UTF-8 或在 ASCII 中。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM