[英]Counting lines with non-printable characters with BSD
I am trying to sort out some bad data in a file on a BSD-style system, which means that I do not have the -P option in grep. 我试图在BSD风格的系统上的文件中整理出一些错误的数据,这意味着我在grep中没有-P选项。 I have 7 million lines of data, and a subset has some strange characters.
我有700万行数据,一个子集有一些奇怪的字符。 If you to a "less" on the file, you'll see something like this:
如果您在文件上输入“较少”,则会看到以下内容:
290437430@89
9^@0333465@88
290348389@87
290342818@8^@
The ^@ is from a bad character that is not ASCII that showed up due to noise on the serial line when the characters were sent. ^ @来自非字符的错误字符,该字符不是ASCII,由于发送字符时串行线上的噪声而显示出来。 These lines are corrupt, and I want to count the number of corrupt data strings.
这些行已损坏,我想计算损坏的数据字符串的数量。
Any suggestions would be greatly appreciated. 任何建议将不胜感激。
As per Chepner's suggestion adding following solution here: 根据Chepner的建议,在此处添加以下解决方案:
grep -c '\x00' Input_file
Following 2 will give only literal characters only. 以下2将仅给出文字字符。
If you want to only count @
then a simple grep
could help you on same. 如果您只想计算
@
那么简单的grep
可以帮助您。
grep -c "@" Input_file
Or in case of counting ^@
then following may help you on same. 或者在计算
^@
情况下,以下内容可能会帮助您。
grep -c "\^@" Input_file
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.