Linux：计算文件中的空格和其他字符

Question

Problem: 问题：

I need to match an exact format for a mailing machine software program. 我需要匹配邮件机软件程序的确切格式。 It expects a certain format. 它期望某种格式。 I can count the number of new lines, carriage returns, tabs ...etc. 我可以计算新行，回车，标签等的数量。 using tools like 使用像

cat -vte

and 和

od -c

and 和

wc -l ( or wc -c )

However, I'd like to know the exact number of leading and trailing spaces between characters and sections of text. 但是，我想知道字符和文本部分之间的前导和尾随空格的确切数量。 Tabs as well. 标签也是如此。

Question: 题：

How would you go about analyzing then matching a template exactly using common unix tools + perl or python? 您将如何分析然后使用常见的unix工具+ perl或python完全匹配模板？ One-liners preferred. 一线人更喜欢。 Also, what's your advice for matching a DOS encoded file? 另外，您对匹配DOS编码文件的建议是什么？ Would you translate it to NIX first, then analyze, or leave, as is? 你会先把它翻译成NIX，然后按原样分析或离开吗？

UPDATE UPDATE

Using this to see individual spaces [ assumes no '%' chars in file ]: 使用它来查看单个空格[假设文件中没有'％'字符]：

sed 's/ /%/g' filename.000

Plan to build a script that analyzes each line's tab and space content. 计划构建一个分析每行选项卡和空间内容的脚本。

Using @shiplu's solution with a nod to the anti-cat crowd: 使用@ shiplu的解决方案，向抗猫人群致敬：

while read l;do echo $l;echo $((`echo $l |  wc -c` - `echo $l | tr -d ' ' | wc -c`));done<filename.000

Still needs some tweaks for Windows but it's well on it's way. 仍然需要对Windows进行一些调整，但它的方式很好。

SAMPLE TEXT 示范文本

Key for reading: 阅读的关键：

newlines marked with \\n 标有\\ n的换行符

Carriage returns marked with \\r 回车标有\\ r \\ n

Unknown space/tab characters marked with [:space:] ( need counts on those ) 标有[：space：]的未知空格/制表符（需要点数）

\r\n
\n
[:space:]Institution Anon LLC\r\n
[:space:]123 Blankety St\r\n
[:space:]Greater Abyss, AK  99999\r\n
\n
\n
[:space:]                                10/27/2011\r\n
[:space:]Requested materials are available for pickup:\r\n
[:space:]e__\r[:space:]                     D_ \r[:space:]   _O\r\n
[:space:]Bathtime for BonZo[:space:]       45454545454545[:space:]  10/27/2011\r\n
[:space:]Bathtime for BonZo[:space:]       45454545454545[:space:]  10/27/2011\r\n
\n
\n
\n
\n
\n
\n
[:space:]                             Pantz McManliss\r\n
[:space:]                             Gibberish Ave\r\n
[:space:]                             Northern Mirkwood, ME  99999\r\n
( untold variable amounts of \n chars go here )

UPDATE 2 更新2

Using IFS with read gives similar results to the ruby posted by someone below. 将IFS与read一起使用会给下面某人发布的ruby提供类似的结果。

while IFS='' read -r line
 do 
     printf "%s\n" "$line" | sed 's/ /%/g' | grep -o '%' | wc -w
 done < filename.000

Answer 1

perl -nlE'say 0+( () = /\s/g );'

Unlike the currently accepted answer, this doesn't split the input into fields, discarding the result. 与当前接受的答案不同，这不会将输入拆分为字段，从而丢弃结果。 It also doesn't needlessly create an array just to count the number of values in a list. 它也不会不必要地创建一个数组来计算列表中的值的数量。

Idioms used: 使用的习语：

0+( ... ) imposes scalar context like scalar( ... ) , but it's clearer because it tells the reader a number is expected. 0+( ... )强加标量上下文，如scalar( ... ) ，但它更清晰，因为它告诉读者一个数字是预期的。
List assignment in scalar context returns the number of elements returned by its RHS, so 0+( () = /.../g ) gives the number of times () = /.../g matched. 标量上下文中的列表赋值返回其RHS返回的元素数，因此0+( () = /.../g )给出匹配的次数() = /.../g 。
-l , when used with -n , will cause the input to be "chomped", so this removes line feeds from the count. -l与-n使用时，会导致输入“chomped”，因此会从计数中删除换行符。

If you're just interested in spaces (U+0020) and tabs (U+0009), the following is faster and simpler: 如果您只对空间（U + 0020）和制表符（U + 0009）感兴趣，则以下内容更快更简单：

perl -nE'say tr/ \t//;'

In both cases, you can pass the input via STDIN or via a file named by an argument. 在这两种情况下，您都可以通过STDIN或通过参数命名的文件传递输入。

Answer 2

Regular expressions in Perl or Python would be the way to go here. Perl或Python中的正则表达式将是这里的方法。

Yes, it may take an initial time investment to learn "perl, schmerl, zwerl" but once you've gained experience with an extremely powerful tool like Regular Expressions, it can save you an enormous amount of time down the road. 是的，可能需要花费初始时间来学习“perl，schmerl，zwerl”，但是一旦你获得了像Regular Expressions这样非常强大的工具的经验，它可以为你节省大量的时间。

在此输入图像描述

Answer 3

counting blanks: 计算空白：

sed 's/[^ ]//g' FILE | tr -d "\n" | wc -c

before, behind and between text. 文本之前，之后和之间。 Do you want to count newlines, tabs, etc. in the same go and sum them up, or as separate step? 您是想在同一个计划中计算换行符，制表符等并将它们相加，还是作为单独的步骤？

Answer 4

perl -nwE 'print; for my $s (/([\t ]+)/g) { say "Count: ", length $s }' input.txt

This will count individual groups of tab or space, instead of counting all the whitespace in the entire line. 这将计算制表符或空格的各个组，而不是计算整行中的所有空格。 For example: 例如：

    foo        bar

Will print 会打印

    foo        bar
Count: 4
Count: 8

You may wish to skip single spaces (spaces between words). 您可能希望跳过单个空格（单词之间的空格）。 Ie don't count the spaces in Bathtime for BonZo . 即不要计算Bathtime for BonZo空间。 If so, replace + with {2,} or whatever minimum you think is appropriate. 如果是这样，请将+替换为{2,}或您认为合适的最小值。

Answer 5

If you want to count the number of space s in pm.txt , this command will do, 如果你想计算pm.txt的space数，这个命令会做，

 cat pm.txt | while read l; 
 do echo $((`echo $l |  wc -c` - `echo $l | tr -d ' ' | wc -c`));
 done;

If you want to count the number of space s, \\r , \\n , \\t use this, 如果你想计算space数s， \\r ， \\n ， \\t使用这个，

cat pm.txt | while read l;
do echo $((`echo $l |  wc -c` - `echo $l | tr -d ' \r\n\t' | wc -c`));
done;

read will strip any leading characters. read将删除任何前导字符。 If you dont want it, there is a nasty way. 如果你不想要它，那就有一种讨厌的方式。 First split your file so that only 1 lines are there per file using 首先拆分文件，使每个文件只使用1行

`split -l 1 -d pm.txt`.

After that there will be bunch of x* files. 之后会有一堆x*文件。 Now loop through it. 现在循环它。

for x in x*; do echo $((`cat $x |  wc -c` - `cat $x | tr -d ' \r\n\t' | wc -c`)); done;

Remove the those files by rm x* ; 用rm x*删除那些文件;

Answer 6

In case Ruby counts (it does count :) 如果Ruby计数（它确实计数 :)

ruby -lne 'puts scan(/\\s/).size'

and now some Perl (slightly less intuitive IMHO): 现在有些Perl（稍微不那么直观的恕我直言）：

perl -lne 'print scalar(@{[/(\\s)/g]})'

Answer 7

If you ask me, I'd write a simple C program to do the counting and formatting all in one go. 如果你问我，我会写一个简单的C程序来一次性完成计数和格式化。 But that's just me. 但那只是我。 By the time I got finished fiddle-farting around with perl, schmerl, zwerl I'd have wasted half a day. 当我用perl，schmerl，zwerl完成小提琴放屁时，我已经浪费了半天时间。

Linux：计算文件中的空格和其他字符

问题描述

7 个解决方案

解决方案1
5 2011-12-31 00:42:29

解决方案2
4 2011-12-30 19:59:40

解决方案3
2 2011-12-30 20:06:29

解决方案4
2 已采纳 2011-12-31 01:04:06

解决方案5
1 2011-12-30 20:06:00

解决方案6
1 2011-12-30 22:11:55

解决方案7
0 2011-12-30 19:59:59

Linux：计算文件中的空格和其他字符

问题描述

7 个解决方案

解决方案1 5 2011-12-31 00:42:29

解决方案2 4 2011-12-30 19:59:40

解决方案3 2 2011-12-30 20:06:29

解决方案4 2 已采纳 2011-12-31 01:04:06

解决方案5 1 2011-12-30 20:06:00

解决方案6 1 2011-12-30 22:11:55

解决方案7 0 2011-12-30 19:59:59

解决方案1
5 2011-12-31 00:42:29

解决方案2
4 2011-12-30 19:59:40

解决方案3
2 2011-12-30 20:06:29

解决方案4
2 已采纳 2011-12-31 01:04:06

解决方案5
1 2011-12-30 20:06:00

解决方案6
1 2011-12-30 22:11:55

解决方案7
0 2011-12-30 19:59:59