如何批量缓冲和处理tail -f输出？

Question

我需要监视文件并将写入其中的内容发送到 Web 服务。 我正在尝试使用 bash 脚本实现一个干净简单的解决方案，例如：

#!/bin/bash

# listen for changes on file specified as first argument
tail -F "$1" | while read LINE
do
  curl http://service.com/endpoint --data "${LINE}"
done

这非常有效，因为在 .. 附加的每一行都将被发布到http://service.com/endpoint 。 但是，我真的不喜欢这样一个事实，即如果在短时间内附加了许多行，我将收到同样多的 HTTP 请求并可能使服务过载。

有没有一种聪明的方法来缓冲操作？ 我可以想到这样的事情：

buffer = EMPTY
while LINES are read:
  add LINE to buffer
  if buffer has more than X LINES
    send POST
  fi
done

但是在上面的解决方案中，如果每小时发布一行，我只会每 X 小时更新一次，这是不可接受的。 另一个类似的解决方案是在 while 循环中“计时”： if X seconds have passed then send buffer, otherwise wait .. 但流的最后一行可能会被无限期保留，因为只有当新行出现时才会触发 while 循环添加到文件中。

目标是使用最少的 bash 脚本并且不使用第二个进程来做到这一点。 第二个进程我的意思是： process 1 gets the output from tail -f and stores it ， process 2 periodically checks what is stored and sends a POST if more than x seconds are elapsed ？

我很好奇这是否可以通过一些巧妙的技巧实现？

谢谢！

Answer 1

把你的伪代码写成代码：

# add stdbuf -oL if you care
tail -F "$1" | {
    # buffer = EMPTY
    buffer=
    # while LINES are read:
    while IFS= read -r line; do
      # add LINE to buffer
      buffer+="$line"$'\n'
      # if buffer has more than X LINES
      # TODO: cache the count of lines in a variable to save cpu
      if [ $(wc -l <<<"$buffer") -gt "$x_lines" ]; then
          # send POST
          # TODO: remove additional newline on the end of buffer, if needed
          curl http://service.com/endpoint --data "${buffer}"
          buffer=
      fi
    done
}

删除缓冲区末尾的换行符或例如在单独的计数器中缓冲行数以节省 cpu 留给其他人。

笔记：

按照惯例，大写变量保留给全局导出变量。
while read LINE将从行中删除前导和尾随空格。 使用while IFS= read -r line读取整行。 bashfaq 上的更多信息如何逐行读取文件
用一行，我相信你可以只使用xargs类的tail -F "$1" | xargs -d$'\\n' -n1 curl http://service.com/endpoint --data tail -F "$1" | xargs -d$'\\n' -n1 curl http://service.com/endpoint --data

要缓冲时间，请超时读取 - 使用 bash 扩展名，例如。 read -t 0.1或通过使整个 read timeout 1 cat 。

为了以两种方式限制行数和超时，我曾经编写了一个名为ratelimit.sh 的命名错误的脚本（命名错误，因为它不限制速率......），它正是这样做的。 它读取行，如果达到行数或超时，它会用额外的输出分隔符刷新它的缓冲区。 我相信它应该像tail -F "$1" | ratelimit.sh --timeout=0.5 --lines=5 | while IFS= read -r -d $'\\x02' buffer; do curl ... --data "$buffer"; done tail -F "$1" | ratelimit.sh --timeout=0.5 --lines=5 | while IFS= read -r -d $'\\x02' buffer; do curl ... --data "$buffer"; done tail -F "$1" | ratelimit.sh --timeout=0.5 --lines=5 | while IFS= read -r -d $'\\x02' buffer; do curl ... --data "$buffer"; done 。 它的工作原理大致如下：

# Written by Kamil Cukrowski (C) 2020
# Licensed jointly under MIT and Beerware license
# config
maxtimeoutns=$((2 * 1000 * 1000 * 1000))
maxlines=5 
input_separator=$'\n'
output_separator=$'\x02'

# the script
timeout_arg=()
while true; do
    chunk=""
    lines=0
    start=$(date +%s%N)
    stop=$((start + maxtimeoutns))

    while true; do

        if [ "$maxtimeoutns" != 0 ]; then
            now=$(date +%s%N)
            if (( now >= stop )); then
                break
            fi
            timeout=$(( stop - now ))
            timeout=$(awk -va=$timeout -vb=1000000000 '{print "%f", a/b}' <<<"")
            timeout_arg=(-t "$timeout")
        fi


        IFS= read -rd "$input_separator" "${timeout_arg[@]}" line && ret=$? || ret=$?

        if (( ret == 0 )); then

            # read succeded
            chunk+=$line$'\n'

            if (( maxlines != 0 )); then
                lines=$((lines + 1))
                if (( lines >= maxlines )); then
                    break
                fi
            fi

        elif (( ret > 128 )); then
            # read timeouted
            break;
        fi
    done

    if (( ${#chunk} != 0 )); then
        printf "%s%s" "$chunk" "$output_separator"
    fi

done

Answer 2

感谢KamilCuk的回答，我设法以一种相当简单的方式实现了我想要的，结合了最大行数和超时。 诀窍是发现管道不一定按行工作，就像我认为的那样......我太傻了！

仅供将来参考，这是我的解决方案，非常具体且简化到骨骼：

#!/bin/bash
# sends updates to $1 via curl every 15 seconds or every 100 lines
tail -F "$1" | while true; do

    chunk=""
    stop=$((`date +%s` + 15))
    maxlines=100

    while true; do

        if (( `date +%s` >= stop )); then break; fi

        IFS= read -r -t 15 line && ret=$? || ret=$?         
        if (( ret == 0 )); then

                chunk+=$line$'\n'
                maxlines=$((maxlines - 1))
                if (( maxlines == 0 )); then break; fi

        elif (( ret > 128 )); then break; fi

    done

    if (( ${#chunk} != 0 )); then
        curl http://service.com --data "$chunk";
    fi

done

如何批量缓冲和处理tail -f输出？

问题描述

2 个解决方案

解决方案1
2 已采纳 2020-01-06 21:57:51

解决方案2
0 2020-01-07 09:43:28

如何批量缓冲和处理tail -f输出？

问题描述

2 个解决方案

解决方案1 2 已采纳 2020-01-06 21:57:51

解决方案2 0 2020-01-07 09:43:28

解决方案1
2 已采纳 2020-01-06 21:57:51

解决方案2
0 2020-01-07 09:43:28