简体   繁体   English

cut,colrm,awk和sed的奇怪问题:无法从管道流中剪切字符

[英]Strange problem with cut,colrm,awk and sed: fail to cut characters from a pipe stream

I have created a script to enumerate all files in a directory and below it. 我创建了一个脚本来枚举目录中及其下面的所有文件。 I wanted to add some progression feed-back by using pv, because I usually use it from the root directory. 我想通过使用pv添加一些进程反馈,因为我通常在根目录中使用它。

The problem is find which always include fractional seconds in its time output (%TT), but I don't want to record so much detail. 问题是发现它的时间输出总是包含小数秒(%TT),但我不想记录这么多细节。

If I write the script to do every thing in one pass, I get the right output. 如果我编写脚本来一次性完成所有事情,我会得到正确的输出。 But if I use intermediate files to have an estimation during a "second" pass, the result change and I do not see why. 但是,如果我使用中间文件在“第二次”传递期间进行估计,结果会发生变化,我不明白为什么。

This version give the right result: 此版本给出了正确的结果:

#!/bin/bash

find -printf "%11s %TY-%Tm-%Td %TT %p\n" 2> /dev/null |
# - Remove the fractional seconds from the time
# before:       4096 2011-01-19 22:43:51.0000000000 .
# after :       4096 2011-01-19 22:43:51 .
colrm 32 42 |
pv -ltrbN "Enumerating files..." |
# - Sort every thing by filename
sort -k 4

But the sort can take a long time, so I tried something like this, to have a little more feed-back: 但是排序可能需要很长时间,所以我尝试了类似的东西,以获得更多的反馈:

#!/bin/bash

TMPFILE1=$(mktemp)
TMPFILE2=$(mktemp)

# Erase temporary files before quitting
trap "rm $TMPFILE1 $TMPFILE2" EXIT

find -printf "%11s %TY-%Tm-%Td %TT %p\n" 2> /dev/null |
pv -ltrbN "Enumerating files..." > $TMPFILE1
LINE_COUNT="$(wc -l $TMPFILE1)"

#cat $TMPFILE1 | colrm 32 42 |                   #1
#cat $TMPFILE1 | cut -c1-31,43- |                #2
#cut -c1-31,43- $TMPFILE1 |                      #3
#sed s/.0000000000// $TMPFILE1 |                 #4
awk -F".0000000000" '{print $1 $2}' $TMPFILE1 |  #5
pv -lN "Removing fractional seconds..." -s $LINE_COUNT > $TMPFILE2

echo "Sorting list by filenames..." >&2
cat $TMPFILE2 |
sort -k 4

None of the 5 "solutions" works. 5个“解决方案”都不起作用。 The ".0000000000" part is left in the output. “.0000000000”部分保留在输出中。

Can someone explain why? 有人可以解释原因吗?

My final solution is to combine the cutting operation with the find and use only one temporary file. 我的最终解决方案是将切割操作与查找结合起来,并仅使用一个临时文件。 Only the sort is done separately. 只有排序是单独完成的。

You can truncate the seconds within the argument to -printf using a field precision specifier (at least using GNU find 4.4.2): 您可以使用字段精度说明符(至少使用GNU find 4.4.2)截断-printf参数中的秒数:

find -printf "%11s %TY-%Tm-%Td %.8TT %p\n"

which leaves the eight characters in "HH:MM:SS". 其中八个字符留在“HH:MM:SS”中。

The rest of my answer is possibly moot: 我的其余部分可能没有实际意义:

The reason your #1-5 don't work is that the output of wc includes the filename (and especially a space). 你的#1-5不起作用的原因是wc的输出包括文件名(尤其是空格)。 The space causes pv to see the filename from the wc command as an input file. 该空间使pvwc命令中看到文件名作为输入文件。 The command line argument has higher precedence than stdin. 命令行参数的优先级高于stdin。 Since it happens to be the same as the input file that's being passed through the pipe, the output file looks like an unprocessed input file (because it is, since the pipeline is ignored). 由于它恰好与通过管道传递的输入文件相同,因此输出文件看起来像一个未处理的输入文件(因为它是,因为管道被忽略)。

To capture only the count without the filename: 仅捕获没有文件名的计数:

LINE_COUNT=$(wc -l < "$TMPFILE1")

Here are some minor improvements: 以下是一些小改进:

< $TMPFILE1 colrm 32 42 |                   #1 No need for cat

or 要么

colrm 32 42 < $TMPFILE1 |                   #1

< $TMPFILE1 cut -c1-31,43- |                #2

or 要么

cut -c1-31,43- < $TMPFILE1 |                #2

sed s/\.0000000000// $TMPFILE1 |            #4 The dot should be escaped

If this an actual working tool, and not just a toy, then I'd just drop the "progress feedback" all together... maybe comeback to it when it doesn't complicate your life. 如果这是一个真正的工作工具,而不仅仅是一个玩具,那么我只是将“进度反馈”全部放在一起......当它不会使你的生活变得复杂时,也许会回归它。 In the meantime you've probably spent more time trying to figure out how to give feedback than you will ever spent waiting for your script to return. 与此同时,你可能花了更多的时间来弄清楚如何提供反馈,而不是等待你的脚本返回。

If you absolutely MUST give some sort of feedback then just echo "Sorting wc -l $TMPFILE lines ..." 如果您绝对必须提供某种反馈,那么请wc -l $TMPFILE “排序wc -l $TMPFILE行...”

You'll get a feeling for how long it'll take to sort so-many lines from experience. 你会感觉从经验中排出这么多行所需要多长时间。

Kiss it my son, kiss it. 吻它我的儿子,吻它。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM