[英]How to buffer and process in batches tail -f output?
I need to monitor a file and send what is written to it to a web service.我需要监视文件并将写入其中的内容发送到 Web 服务。 I'm trying to achieve a clean and simple solution with bash scripting, eg:我正在尝试使用 bash 脚本实现一个干净简单的解决方案,例如:
#!/bin/bash
# listen for changes on file specified as first argument
tail -F "$1" | while read LINE
do
curl http://service.com/endpoint --data "${LINE}"
done
This works perfectly, as in.. every line which is appended will be POST'ed to http://service.com/endpoint .这非常有效,因为在 .. 附加的每一行都将被发布到http://service.com/endpoint 。 However, I don't really like the fact that if many lines are appended in a short time, I will have as many HTTP requests and possibly overload the service.但是,我真的不喜欢这样一个事实,即如果在短时间内附加了许多行,我将收到同样多的 HTTP 请求并可能使服务过载。
Is there a smart way to kind of buffer the operations?有没有一种聪明的方法来缓冲操作? I can think of something like:我可以想到这样的事情:
buffer = EMPTY
while LINES are read:
add LINE to buffer
if buffer has more than X LINES
send POST
fi
done
But in the above solution if one line is posted per hour, I will only get updates every X hours, which is not acceptable.但是在上面的解决方案中,如果每小时发布一行,我只会每 X 小时更新一次,这是不可接受的。 Another similar solution would be to "time" within the while loop: if X seconds have passed then send buffer, otherwise wait
.. but the last line of a stream may be held indefinitely since the while loop is triggered only when a new line is added to the file.另一个类似的解决方案是在 while 循环中“计时”: if X seconds have passed then send buffer, otherwise wait
.. 但流的最后一行可能会被无限期保留,因为只有当新行出现时才会触发 while 循环添加到文件中。
The objective is to do this with minimal bash scripting and without using a second process .目标是使用最少的 bash 脚本并且不使用第二个进程来做到这一点。 By second process I mean: process 1 gets the output from tail -f and stores it
and process 2 periodically checks what is stored and sends a POST if more than x seconds are elapsed
?第二个进程我的意思是: process 1 gets the output from tail -f and stores it
, process 2 periodically checks what is stored and sends a POST if more than x seconds are elapsed
?
I am curious if this is possible by some clever trick?我很好奇这是否可以通过一些巧妙的技巧实现?
Thanks!谢谢!
Literally putting your pseudocode to code:把你的伪代码写成代码:
# add stdbuf -oL if you care
tail -F "$1" | {
# buffer = EMPTY
buffer=
# while LINES are read:
while IFS= read -r line; do
# add LINE to buffer
buffer+="$line"$'\n'
# if buffer has more than X LINES
# TODO: cache the count of lines in a variable to save cpu
if [ $(wc -l <<<"$buffer") -gt "$x_lines" ]; then
# send POST
# TODO: remove additional newline on the end of buffer, if needed
curl http://service.com/endpoint --data "${buffer}"
buffer=
fi
done
}
Removing the newline on the end of the buffer or for example buffering the number of lines in a separate counter to save cpu is left for others.删除缓冲区末尾的换行符或例如在单独的计数器中缓冲行数以节省 cpu 留给其他人。
Notes:笔记:
while read LINE
will remove leading and trailing whitespaces from the line. while read LINE
将从行中删除前导和尾随空格。 Use while IFS= read -r line
to read the whole line.使用while IFS= read -r line
读取整行。 More info at bashfaq how to read a file line by line bashfaq 上的更多信息如何逐行读取文件xargs
like tail -F "$1" | xargs -d$'\\n' -n1 curl http://service.com/endpoint --data
用一行,我相信你可以只使用xargs
类的tail -F "$1" | xargs -d$'\\n' -n1 curl http://service.com/endpoint --data
tail -F "$1" | xargs -d$'\\n' -n1 curl http://service.com/endpoint --data
To buffer with the time, timeout the reading - either with bash extension, ex.要缓冲时间,请超时读取 - 使用 bash 扩展名,例如。 read -t 0.1
or by timeouting the whole read timeout 1 cat
. read -t 0.1
或通过使整个 read timeout 1 cat
。
To limit in both ways, the number of lines and with the timeout, I once written a badly named script called ratelimit.sh (badly named, because it does not limit rate...), that does exactly that.为了以两种方式限制行数和超时,我曾经编写了一个名为ratelimit.sh 的命名错误的脚本(命名错误,因为它不限制速率......),它正是这样做的。 It reads lines, and if either count of lines or timeout is reached, it flushes it's buffer with additional output separator.它读取行,如果达到行数或超时,它会用额外的输出分隔符刷新它的缓冲区。 I believe it's meant to be used like tail -F "$1" | ratelimit.sh --timeout=0.5 --lines=5 | while IFS= read -r -d $'\\x02' buffer; do curl ... --data "$buffer"; done
我相信它应该像tail -F "$1" | ratelimit.sh --timeout=0.5 --lines=5 | while IFS= read -r -d $'\\x02' buffer; do curl ... --data "$buffer"; done
tail -F "$1" | ratelimit.sh --timeout=0.5 --lines=5 | while IFS= read -r -d $'\\x02' buffer; do curl ... --data "$buffer"; done
tail -F "$1" | ratelimit.sh --timeout=0.5 --lines=5 | while IFS= read -r -d $'\\x02' buffer; do curl ... --data "$buffer"; done
. tail -F "$1" | ratelimit.sh --timeout=0.5 --lines=5 | while IFS= read -r -d $'\\x02' buffer; do curl ... --data "$buffer"; done
。 It roughly works like this:它的工作原理大致如下:
# Written by Kamil Cukrowski (C) 2020
# Licensed jointly under MIT and Beerware license
# config
maxtimeoutns=$((2 * 1000 * 1000 * 1000))
maxlines=5
input_separator=$'\n'
output_separator=$'\x02'
# the script
timeout_arg=()
while true; do
chunk=""
lines=0
start=$(date +%s%N)
stop=$((start + maxtimeoutns))
while true; do
if [ "$maxtimeoutns" != 0 ]; then
now=$(date +%s%N)
if (( now >= stop )); then
break
fi
timeout=$(( stop - now ))
timeout=$(awk -va=$timeout -vb=1000000000 '{print "%f", a/b}' <<<"")
timeout_arg=(-t "$timeout")
fi
IFS= read -rd "$input_separator" "${timeout_arg[@]}" line && ret=$? || ret=$?
if (( ret == 0 )); then
# read succeded
chunk+=$line$'\n'
if (( maxlines != 0 )); then
lines=$((lines + 1))
if (( lines >= maxlines )); then
break
fi
fi
elif (( ret > 128 )); then
# read timeouted
break;
fi
done
if (( ${#chunk} != 0 )); then
printf "%s%s" "$chunk" "$output_separator"
fi
done
Thanks to KamilCuk 's answer, I managed to achieve what I wanted in a rather simple way, combining max number of lines and timeouts.感谢KamilCuk的回答,我设法以一种相当简单的方式实现了我想要的,结合了最大行数和超时。 The trick was to discover that the piping doesn't necessarily work by lines, like I thought it did..silly me!诀窍是发现管道不一定按行工作,就像我认为的那样......我太傻了!
Just for future reference this is my solution which is very specific and simplified to the bone:仅供将来参考,这是我的解决方案,非常具体且简化到骨骼:
#!/bin/bash
# sends updates to $1 via curl every 15 seconds or every 100 lines
tail -F "$1" | while true; do
chunk=""
stop=$((`date +%s` + 15))
maxlines=100
while true; do
if (( `date +%s` >= stop )); then break; fi
IFS= read -r -t 15 line && ret=$? || ret=$?
if (( ret == 0 )); then
chunk+=$line$'\n'
maxlines=$((maxlines - 1))
if (( maxlines == 0 )); then break; fi
elif (( ret > 128 )); then break; fi
done
if (( ${#chunk} != 0 )); then
curl http://service.com --data "$chunk";
fi
done
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.