简体   繁体   English

I / O上用于Linux的命令的简单超时

[英]simple timeout on I/O for command for linux

First the background to this intriguing challenge. 首先,这个有趣的挑战的背景。 The continuous integration build can often have failures during development and testing of deadlocks, loops, or other issues that result in a never ending test. 在开发和测试死锁,循环或其他导致永无止境的测试的过程中,持续集成构建通常会出现故障。 So all the mechanisms for notifying that a build has failed become useless. 因此,用于通知构建已失败的所有机制均无用。

The solution will be to have the build script timeout if there's zero output to the build log file for more than 5 minutes since the build routinely writes out the names of unit tests as it proceeds. 解决方案是,如果构建日志在执行过程中定期写出单元测试的名称,则如果零输出到构建日志文件超过5分钟,则将使构建脚本超时。 So that's the best way to identify it's "frozen". 因此,这是识别其“冻结”状态的最佳方法。

Okay. 好的。 Now the nitty gritty... 现在真是坚韧不拔...

The build server uses Hudson to run a simple bash script that invokes the more complex build script based on Nant and MSBuild (all on Windows). 构建服务器使用Hudson运行一个简单的bash脚本,该脚本调用基于Nant和MSBuild的更复杂的构建脚本(全部在Windows上)。

So far all solutions around the net involve a timeout on the total run time of the command. 到目前为止,网上的所有解决方案都涉及命令总运行时间的超时。 But that solution fails in this case because the tests might hang or freeze in the first 5 minutes. 但是这种解决方案在这种情况下失败了,因为测试可能在前5分钟内挂起或冻结。

What we've thought of so far: 到目前为止,我们想到的是:

First, here's the high level bash command run the full test suite in Hudson. 首先,这是在Hudson中运行完整测试套件的高级bash命令。

build.sh clean free test

That command simply sends all the Nant and MSBuild build logging to stdout. 该命令只是将所有Nant和MSBuild构建日志发送到stdout。

It's obvious that we need to tee that output to a file: 显然,我们需要将该输出发送到文件中:

build.sh clean free test 2>&1 | tee build.out

Then in parallel a command needs to sleep, check the modify time of the file and if more than 5 minutes kill the main process. 然后并行执行一个命令,需要睡眠,检查文件的修改时间,如果超过5分钟,则会终止主进程。 A kill -9 will be fine at that point--nothing graceful needed once it has frozen. 此时, kill -9会很好-冻结后无需优雅。

That's the part you can help with. 那就是您可以帮助的部分。

In fact, I made a script like this over 15 years ago to kill the connection with a data phone line to japan after periods of inactivity but can't remember how I did it. 实际上,我在15年前编写了一个这样的脚本,以在一段时间不活动后终止与日本的数据电话线的连接,但不记得我是如何做到的。

Sincerely, Wayne 真诚的韦恩

build.sh clean free test 2>&1 | tee build.out &
sleep 300
kill -KILL %1

您也许可以使用timeout

timeout 300 command

Solved this myself by writing a bash script. 通过编写bash脚本自己解决了这个问题。

It's called iotimeout with one parameter which is the number of seconds. 称为iotimeout的参数是秒数。

You use it like this: 您可以这样使用它:

build.sh clean dev test | build.sh清洁开发人员测试| iotimeout 120 iotimeout 120

iotimeout has 2 loops. iotimeout有2个循环。

One is a simple while read line loop that echos echo line but it also uses the touch command to update the modified time of a tmp file every time it writes a line. 一种是简单的同时读取行循环,它回显回显行,但它也使用touch命令在每次写入行时更新tmp文件的修改时间。 Unfortunately, it wasn't possible to monitor a build.out file because Windoze doesn't update the file modified time until you close the file. 不幸的是,无法监视build.out文件,因为在关闭文件之前,Windoze不会更新文件的修改时间。 Oh well. 那好吧。

Another loop runs in the background, that's a forever loop which sleeps 10 seconds and then checks the modified time of the temp file. 另一个循环在后台运行,这是一个永久循环,它将休眠10秒,然后检查临时文件的修改时间。 If that ever exceeds 120 seconds old then that loop forces the entire process group to exit. 如果超过120秒,那么该循环将迫使整个过程组退出。

The only tricky stuff was returning the exit code of the original program. 唯一棘手的事情是返回原始程序的退出代码。 Bash gives you a PIPESTATUS array to solve that. Bash为您提供了PIPESTATUS数组来解决该问题。

Also, figuring out how to kill the entire program group was some research but turns out to be easy just--kill 0 另外,弄清楚如何杀死整个程序组也是一项研究,但事实证明这很容易-杀死0

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM