Bash 循环文件过早结束

Question

I have troubles in Bash looping within a text file of ~20k lines.我在 Bash 在大约 20k 行的文本文件中循环时遇到问题。

Here is my (minimised) code:这是我的（最小化）代码：

LINE_NB=0
while IFS= read -r LINE; do
    LINE_NB=$((LINE_NB+1))
    CMD=$(sed "s/\([^ ]*\) .*/\1/" <<< ${LINE})
    echo "[${LINE_NB}] ${LINE}: CMD='${CMD}'"   
done <"${FILE}"

The while loop ends prematurely after a few hundreds iterations. while 循环在数百次迭代后过早结束。 However, the loop works correctly if I remove the CMD=$(sed...) part.但是，如果我删除 CMD=$(sed...) 部分，循环可以正常工作。 So, evidently, there is some interference I cannot spot.所以，显然，有一些我无法发现的干扰。

As I ready here , I also tried:当我在这里准备好的时候，我也尝试过：

LINE_NB=0
while IFS= read -r -u4 LINE; do
    LINE_NB=$((LINE_NB+1))
    CMD=$(sed "s/\([^ ]*\) .*/\1/" <<< ${LINE})
    echo "[${LINE_NB}] ${LINE}: CMD='${CMD}'"
done 4<"${FILE}"

but nothing changes.但没有任何改变。 Any explanation for this behaviour and help on how can I solve it?对此行为的任何解释以及如何解决它的帮助？

Thanks!谢谢！

To clarify the situation for user1934428 (thanks for your interest,).澄清 user1934428 的情况（感谢您的关注，）。 I now have created a minimal script and added "set -x": The full script is as follows:我现在创建了一个最小的脚本并添加了“set -x”：完整的脚本如下：

#!/usr/bin/env bash
set -x
FILE="$1"
LINE_NB=0

while IFS= read -u "$file_fd" -r LINE; do
  LINE_NB=$((LINE_NB+1))
  CMD=$(sed "s/\([^ ]*\) .*/\1/" <<< "${LINE}")
  echo "[${LINE_NB}] ${LINE}: CMD='${CMD}'" #, TIME='${TIME}' "

done {file_fd}<"${FILE}"

echo "Done."

The input file is a list of ~20k lines of the form:输入文件是大约 20k 行的列表，格式如下：

S1 0.018206
L1 0.018966
F1 0.006833
S2 0.004212
L2 0.008005
I8R190 18.3791
I4R349 18.5935
...

The while loops ends prematurely at (seemingly) random points. while 循环在（看似）随机点过早结束。 One possible output is:一种可能的 output 是：

+ FILE=20k/ir-collapsed.txt
+ LINE_NB=0
+ IFS=
+ read -u 10 -r LINE
+ LINE_NB=1
++ sed 's/\([^ ]*\) .*/\1/'
+ CMD=S1
+ echo '[1] S1 0.018206: CMD='\''S1'\'''
[1] S1 0.018206: CMD='S1'
+ echo '[6510] S1514 0.185504: CMD='\''S1514'\'''
...[snip]...
[6510] S1514 0.185504: CMD='S1514'
+ IFS=
+ read -u 10 -r LINE
+ echo Done.
Done.

As you can see, the loop ends prematurely after line 6510, while the input file is ~20k lines long.如您所见，循环在第 6510 行之后提前结束，而输入文件的长度约为 20k 行。

Answer 1

Yes, making a stable file copy is a best start.是的，制作稳定的文件副本是最好的开始。
Learning awk and/or perl is still well worth your time.学习awk和/或perl仍然值得您花时间。 It's not as hard as it looks.它并不像看起来那么难。 :) :)

Aside from that, a couple of optimizations - try to never run any program inside a loop when you can avoid it.除此之外，还有一些优化——当你可以避免时，尽量不要在循环中运行任何程序。 For a 20k line file, that's 20k sed s, which really adds up unnecessarily.对于 20k 行文件，即 20k sed s，这确实是不必要的。 Instead you could just use parameter parsing for this one.相反，您可以只对此使用参数解析。

# don't use all caps.
# cmd=$(sed "s/\([^ ]*\) .*/\1/" <<< "${line}") becomes
cmd="${cmd%% *}" # strip everything from the first space

Using the read to handle that is even better, since you were already using it anyway, but don't spawn another if you can avoid it.使用read来处理它会更好，因为无论如何你已经在使用它了，但是如果你可以避免它，就不要产生另一个。 As much as I love it, read is pretty inefficient;尽管我很喜欢它，但read效率很低。 it has to do a lot of fiddling to handle all its options.它必须做很多摆弄才能处理所有选项。

while IFS= read -u "$file_fd" cmd timeval; do
  echo "[$((++line_nb))] CMD='${CMD}' TIME='${timeval}'"
done {file_fd}<"${file}"

or或者

while IFS= read -u "$file_fd" -r -a tok; do
  echo "[$((++line_nb))] LINE='${tok[@]}' CMD='${tok[0]}' TIME='${tok[1]}'"
done {file_fd}<"${file}"

(This will sort of rebuild the line, but if there were tabs or extra spaces, etc, it will only pad with the 1st char of $IFS , which is a space by default. Shouldn't matter here.) （这将重建该行，但如果有制表符或额外的空格等，它只会填充$IFS的第一个字符，默认情况下是一个空格。在这里应该没关系。）

awk would have made short work of this, though, and been a lot faster, with better tools already built in.不过， awk本来可以很短的完成这项工作，而且速度要快得多，而且已经内置了更好的工具。

awk '{printf "NR=[%d] LINE=[%s] CMD=[%s] TIME=[%s]\n",NR,$0,$1,$2 }' 20k/ir-collapsed.txt

Run some time comparisons - with and without the sed , with one read vs two, and then compare each against the awk .运行一些时间比较 - 使用和不使用sed ， read一次与两次，然后将每个读取与awk进行比较。 :) :)

The more things you have to do with each line, and the more lines there are in the file, the more it will matter.每行要做的事情越多，文件中的行越多，它就越重要。 Make it a habit to do even small things as neatly as you can - it will pay off well in the long run.养成尽可能整洁地做小事的习惯——从长远来看，这会带来很好的回报。

Bash 循环文件过早结束

问题描述

1 个解决方案

解决方案1
2 已采纳 2020-08-12 13:59:32

Bash 循环文件过早结束

问题描述

1 个解决方案

解决方案1 2 已采纳 2020-08-12 13:59:32

解决方案1
2 已采纳 2020-08-12 13:59:32