简体   繁体   English

SGE脚本:在执行过程中打印到文件(不只是在末尾)?

[英]SGE script: print to file during execution (not just at the end)?

I have an SGE script to execute some python code, submitted to the queue using qsub. 我有一个SGE脚本来执行一些python代码,并使用qsub提交到队列。 In the python script, I have a few print statements (updating me on the progress of the program). 在python脚本中,我有一些打印语句(在程序进度上更新我)。 When I run the python script from the command line, the print statements are sent to stdout. 当我从命令行运行python脚本时,打印语句将发送到stdout。 For the sge script, I use the -o option to redirect the output to a file. 对于sge脚本,我使用-o选项将输出重定向到文件。 However, it seems that the script will only send these to the file after the python script has completed running. 但是,似乎该脚本只会在python脚本完成运行之后才将这些发送到文件中。 This is annoying because (a) I can no longer see real time updates on the program and (b) if my job does not terminate correctly (for example if my job gets kicked off the queue) none of the updates are printed. 这很烦人,因为(a)我再也看不到程序的实时更新,并且(b)如果我的工作没有正确终止(例如,如果我的工作被踢出队列),则不会打印任何更新。 How can I make sure that the script is writing to the file each time it I want to print something, as opposed to lumping it all together at the end? 我如何确保脚本每次要打印某些内容时都写入文件,而不是最后将它们放在一起?

I think you are running into an issue with buffered output. 我认为您在缓冲输出方面遇到了问题。 Python uses a library to handle it's output, and the library knows that it's more efficient to write a block at a time when it's not talking to a tty. Python使用一个库来处理其输出,并且该库知道在不与tty对话时编写块的效率更高。

There are a couple of ways to work around this. 有两种方法可以解决此问题。 You can run python with the "-u" option (see the python man page for details), for example, with something like this as the first line of your script: 您可以使用“ -u”选项运行python(有关详细信息,请参见python手册页),例如,将以下内容作为脚本的第一行:

#! /usr/bin/python -u

but this doesn't work if you are using the "/usr/bin/env" trick because you don't know where python is installed. 但这在使用“ / usr / bin / env”把戏的情况下不起作用,因为您不知道python的安装位置。

Another way is to reopen the stdout with something like this: 另一种方法是使用以下方法重新打开标准输出:

import sys 
import os 

# reopen stdout file descriptor with write mode 
# and 0 as the buffer size (unbuffered) 
sys.stdout = os.fdopen(sys.stdout.fileno(), 'w', 0) 

Note the bufsize parameter of os.fdopen being set to 0 to force it to be unbuffered. 请注意,将os.fdopen的bufsize参数设置为0以强制使其不进行缓冲。 You can do something similar with sys.stderr. 您可以使用sys.stderr做类似的事情。

As others mentioned, it is out of performance reasons to not always write the stdout when not connected to a tty. 正如其他人提到的那样,出于性能原因,在未连接到tty时并不总是写stdout。

If you have a specific point at which you want the stdout to be written, you can force that by using 如果您有特定的点要写入标准输出,可以通过使用

import sys
sys.stdout.flush()

at that point. 在那时候。

I just encountered a similar issue with SGE, and no suggested method to "unbuffer" the file IO seemed to work for me. 我只是遇到了SGE的类似问题,并且没有建议的方法来“解缓冲”文件IO似乎对我有用。 I had to wait until the end of program execution to see any output. 我不得不等到程序执行结束才能看到任何输出。

The workaround I found was to wrap sys.stdout into a custom object that re-implements the "write" method. 我发现的解决方法是将sys.stdout包装到一个自定义对象中,该对象重新实现“ write”方法。 Instead of actually writing to stdout, this new method instead opens the file where IO is redirected, appends with the desired data, and then closes the file. 该新方法不是实际写入stdout,而是打开重定向IO的文件,追加所需的数据,然后关闭文件。 It's a bit ugly, but I found it solved the problem, since the actual opening/closing of the file forces IO to be interactive. 这有点难看,但是我发现它解决了问题,因为文件的实际打开/关闭迫使IO是交互式的。

Here's a minimal example: 这是一个最小的示例:

import os, sys, time

class RedirIOStream:
  def __init__(self, stream, REDIRPATH):
    self.stream = stream
    self.path = REDIRPATH
  def write(self, data):
    # instead of actually writing, just append to file directly!
    myfile = open( self.path, 'a' )
    myfile.write(data)
    myfile.close()
  def __getattr__(self, attr):
    return getattr(self.stream, attr)


if not sys.stdout.isatty():
  # Detect redirected stdout and std error file locations!
  #  Warning: this will only work on LINUX machines
  STDOUTPATH = os.readlink('/proc/%d/fd/1' % os.getpid())
  STDERRPATH = os.readlink('/proc/%d/fd/2' % os.getpid())
  sys.stdout=RedirIOStream(sys.stdout, STDOUTPATH)
  sys.stderr=RedirIOStream(sys.stderr, STDERRPATH)


# Simple program to print msg every 3 seconds
def main():    
  tstart = time.time()
  for x in xrange( 10 ):  
    time.sleep( 3 )
    MSG = '  %d/%d after %.0f sec' % (x, args.nMsg,  time.time()-tstart )
    print MSG

if __name__ == '__main__':
  main()

This is SGE buffering the output of your process, it happens whether its a python process or any other. 这是SGE缓冲进程的输出,无论是python进程还是其他进程,都会发生这种情况。

In general you can decrease or disable the buffering in SGE by changing it and recompiling. 通常,可以通过更改和重新编译来减少或禁用SGE中的缓冲。 But its not a great thing to do, all that data is going to be slowly written to disk affecting your overall performance. 但这并不是一件好事,所有数据都将缓慢写入磁盘,从而影响整体性能。

Why not print to a file instead of stdout? 为什么不打印到文件而不是标准输出?

outFileID = open('output.log','w')
print(outFileID,'INFO: still working!')
print(outFileID,'WARNING: blah blah!')

and use 和使用

tail -f output.log

This works for me: 这对我有用:

class ForceIOStream:
    def __init__(self, stream):
        self.stream = stream

    def write(self, data):
        self.stream.write(data)
        self.stream.flush()
        if not self.stream.isatty():
            os.fsync(self.stream.fileno())

    def __getattr__(self, attr):
        return getattr(self.stream, attr)


sys.stdout = ForceIOStream(sys.stdout)
sys.stderr = ForceIOStream(sys.stderr)

and the issue has to do with NFS not syncing data back to the master until a file is closed or fsync is called. 问题与NFS在关闭文件或调用fsync之前不将数据同步回主服务器有关。

I hit this same problem today and solved it by just writing to disk instead of printing: 我今天遇到了同样的问题,并通过只写磁盘而不是打印来解决了这个问题:

with open('log-file.txt','w') as out:
  out.write(status_report)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM