简体   繁体   English

在子过程stdin.write中管道损坏

[英]Broken pipe during a subprocess stdin.write

I interact with a server that I use to tag sentences. 我与用于标记句子的服务器进行交互。 This server is launched locally on port 2020 . 该服务器在端口2020本地启动。

For example, if I send Je mange des pâtes . 例如,如果我发送Je mange des pâtes . on port 2020 through the client used below, the server answers Je_CL mange_V des_P pâtes_N ._. 通过下面使用的客户端在端口2020 ,服务器将Je_CL mange_V des_P pâtes_N ._. , the result is always one line only, and always one line if my input is not empty. ,结果总是只有一行,如果我的输入不为空,则结果总是只有一行。

I currently have to tag 9 568 files through this server. 我目前必须通过该服务器标记9568个文件。 The first 9 483 files are tagged as expected. 前9 483个文件已按预期标记。 After that, the input stream seems closed / full / something else because I get an IOError , specifically a Broken Pipe error when I try to write on stdin . 之后,输入流似乎已关闭/已满/其他原因,因为当我尝试在stdin上进行写操作时,出现IOError ,特别是Broken Pipe错误。

When I skip the first 9 483 first files, the last ones are tagged without any issue, including the one causing the first error. 当我跳过前9 483个第一个文件时,最后一个文件被标记为没有任何问题,包括引起第一个错误的文件。

My server doesn't produce any error log indicating something fishy happened... Do I handle something incorrectly? 我的服务器没有产生任何错误日志,表明发生了一些可疑的事情……我处理不正确吗? Is it normal that the pipe fails after some time? 一段时间后管道出现故障是否正常?

log = codecs.open('stanford-tagger.log', 'w', 'utf-8')
p1 = Popen(["java",
            "-cp", JAR,
            "edu.stanford.nlp.tagger.maxent.MaxentTaggerServer",
            "-client",
            "-port", "2020"],
           stdin=PIPE,
           stdout=PIPE,
           stderr=log)

fhi = codecs.open(SUMMARY, 'r', 'utf-8') # a descriptor of the files to tag

for i, line in enumerate(fhi, 1):
    if i % 500:
        print "Tagged " + str(i) + " documents..."
    tokens = ... # a list of words, can be quite long
    try:
        p1.stdin.write(' '.join(tokens).encode('utf-8') + '\n')
    except IOError:
        print 'bouh, I failed ;(('
    result = p1.stdout.readline()
    # Here I do something with result...
fhi.close()

In addition to my comments, I might suggest a few other changes... 除了我的评论外,我可能还会建议其他一些更改...

for i, line in enumerate(fhi, 1):
    if i % 500:
        print "Tagged " + str(i) + " documents..."
    tokens = ... # a list of words, can be quite long
    try:
        s = ' '.join(tokens).encode('utf-8') + '\n'
        assert s.find('\n') == len(s) - 1       # Make sure there's only one CR in s
        p1.stdin.write(s)
        p1.stdin.flush()                        # Block until we're sure it's been sent
    except IOError:
        print 'bouh, I failed ;(('
    result = p1.stdout.readline()
    assert result                               # Make sure we got something back
    assert result.find('\n') == len(result) - 1 # Make sure there's only one CR in result
    # Here I do something with result...
fhi.close()

...but given there's also a client/server of which we know nothing about, there's a lot of places it could be going wrong. ...但是考虑到还有一个我们不了解的客户端/服务器,在很多地方它可能会出错。

Does it work if you dump all the queries into a single file, and then run it from the commandline with something like... 如果将所有查询都转储到单个文件中,然后从命令行运行类似以下命令,它是否起作用?

java .... < input > output

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM