简体   繁体   English

用BSD计算具有不可打印字符的行

[英]Counting lines with non-printable characters with BSD

I am trying to sort out some bad data in a file on a BSD-style system, which means that I do not have the -P option in grep. 我试图在BSD风格的系统上的文件中整理出一些错误的数据,这意味着我在grep中没有-P选项。 I have 7 million lines of data, and a subset has some strange characters. 我有700万行数据,一个子集有一些奇怪的字符。 If you to a "less" on the file, you'll see something like this: 如果您在文件上输入“较少”,则会看到以下内容:

290437430@89
9^@0333465@88
290348389@87
290342818@8^@

The ^@ is from a bad character that is not ASCII that showed up due to noise on the serial line when the characters were sent. ^ @来自非字符的错误字符,该字符不是ASCII,由于发送字符时串行线上的噪声而显示出来。 These lines are corrupt, and I want to count the number of corrupt data strings. 这些行已损坏,我想计算损坏的数据字符串的数量。

Any suggestions would be greatly appreciated. 任何建议将不胜感激。

As per Chepner's suggestion adding following solution here: 根据Chepner的建议,在此处添加以下解决方案:

grep -c '\x00' Input_file

Following 2 will give only literal characters only. 以下2将仅给出文字字符。

If you want to only count @ then a simple grep could help you on same. 如果您只想计算@那么简单的grep可以帮助您。

grep -c "@"  Input_file

Or in case of counting ^@ then following may help you on same. 或者在计算^@情况下,以下内容可能会帮助您。

grep -c "\^@"  Input_file

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM