简体   繁体   English

Ruby中的面向行的流式传输(如grep)

[英]Line-oriented streaming in Ruby (like grep)

By default Ruby opens $stdin and $stdout in buffered mode. 默认情况下,Ruby在缓冲模式下打开$stdin$stdout This means you can't use Ruby to perform a grep-like operation filtering text. 这意味着您不能使用Ruby来执行类似grep的操作过滤文本。 Is there any way to force Ruby to use line-oriented mode? 有没有办法强迫Ruby使用面向行的模式? I've seen various solutions including popen3 (which does buffered-mode only) and pty (which doesn't separately handle $stdout and $stderr , which I require). 我见过各种解决方案,包括popen3 (仅限缓冲模式)和pty (它不单独处理$stdout$stderr ,我需要它)。

How do I do this? 我该怎么做呢? Python seems to have the same lack. Python似乎也有同样的缺点。

You can always turn on autoflush on any stream you want: 您始终可以在所需的任何流上启用autoflush:

STDOUT.sync = true

This will have the effect of committing any writes immediately. 这将具有立即提交任何写入的效果。

Most languages have this feature, but they always call it something a little different. 大多数语言都有这个功能,但它们总是称之为有点不同。

It looks like your best bet is to use STDOUT.syswrite and STDOUT.sysread - the following seemed to have reasonably good performance, despite being ugly code: 看起来你最好的选择是使用STDOUT.syswrite和STDOUT.sysread - 以下似乎有相当不错的性能,尽管它是丑陋的代码:

STDIN.sync = true
STDOUT.syswrite "Looking for #{ARGV[0]}\n"

def next_line
  mybuff = @overflow || ""
  until mybuff[/\n/]
    mybuff += STDIN.sysread(8)
  end
  overflow = mybuff.split("\n")
  out, *others = overflow
  @overflow = others.join("\n")
  out
rescue EOFError => e
  false  # NB: There's a bug here, see below
end

line = next_line
while line
  STDOUT.syswrite "#{line}\n" if line =~ /#{ARGV[0]}/i
  line = next_line
end

Note: Not sure you need #sync with #sysread, but if so you should probably sync STDOUT too. 注意:不确定#sysread需要#sync,但如果是这样,您也应该同步STDOUT。 Also, it reads 8 bytes at a time into mybuff - you should experiment with this value, it's highly inefficient / CPU heavy. 此外,它一次读取8个字节进入mybuff - 你应该试验这个值,这是非常低效/ CPU重。 Lastly, this code is hacky and needs a refactor, but it works - tested it using ls -l ~/* | ruby rgrep.rb doc 最后,这个代码是hacky并且需要一个重构,但是它可以工作 - 使用ls -l ~/* | ruby rgrep.rb doc来测试它 ls -l ~/* | ruby rgrep.rb doc (where 'doc' is the search term) ls -l ~/* | ruby rgrep.rb doc (其中'doc'是搜索词)


Second note: Apparently, I was so busy trying to get it to perform well, I failed to get it to perform correctly! 第二个注意事项:显然,我正忙于努力让它表现良好,我没能让它正确执行! As Dmitry Shevkoplyas has noted , if there is text in @overflow when EOFError is raised, that text will be lost. 正如Dmitry Shevkoplyas所指出的 ,如果在引发EOFError时@overflow中有文本,则该文本将丢失。 I believe if you replace the catch with the following, it should fix the problem: 我相信如果用以下内容替换catch,它应该解决问题:

rescue EOFError => e
  return false unless @overflow && @overflow.length > 0
  output = @overflow
  @overflow = ""
  output
end

(if you found that helpful, please upvote Dmitry's answer !) (如果你发现有帮助,请upvote Dmitry的答案 !)

The accepted answer by user208769 is good, but has one flaw: under certain conditions you will loose last line. user208769接受的答案是好的,但有一个缺陷:在某些条件下你会松开最后一行。 I'll show how to reproduce it and how to fix it below: 我将展示如何重现它以及如何解决它:

To reproduce the "last line lost" bug: 要重现“最后一行丢失”错误:

mkdir deleteme
touch deleteme/1 deleteme/2 deleteme/3
ls deleteme/ | ./rgrep.rb ''
Looking for
1
2

as you can see the "3" file is missing from the rgrep output. 正如您所看到的,rgrep输出中缺少“3”文件。 Surprisingly for the different filename length it would work differently though! 令人惊讶的是,对于不同的文件名长度,它可以有所不同! Look: 看:

rm -fr deleteme/
mkdir deleteme
touch deleteme/11 deleteme/22 deleteme/33
ls deleteme/ | ./rgrep.rb ''
Looking for
11
22
33

Now the third file is present! 现在第三个文件存在! What a bug! 多么大错! Isn't she a beauty!!? 她不是美女!!?
One can only imagine how much damage this random behaviour can cause. 人们只能想象这种随机行为会造成多大的伤害。

To fix the bug we'd modify rescue portion slightly: 为了解决这个问题,我们稍微修改了救援部分:

#!/usr/bin/env ruby
STDIN.sync = true
STDOUT.syswrite "Looking for #{ARGV[0]}\n"

def next_line
  mybuff = @overflow || ""
  until mybuff[/\n/]
    mybuff += STDIN.sysread(8)
  end
  overflow = mybuff.split("\n")
  out, *others = overflow
  @overflow = others.join("\n")
  out
rescue EOFError => e
  if @overflow.to_s.size > 0
    leftover_line = @overflow
    @overflow = ''
    return leftover_line
  else
    false
  end
end

line = next_line
while line
  STDOUT.syswrite "#{line}\n" if line =~ /#{ARGV[0]}/i
  line = next_line
end

I'll leave the "why" portion out of this post as an exercise for curious ones as otherwise it wont be digested properly (and this post is already way too long for my 1st post ever;) heh.. 我会把这个帖子中的“为什么”部分留作好奇的练习,否则就不会被正确消化(这篇帖子对于我的第一篇帖子来说已经太久了;)嘿..

打印$stdout.flush后,可以调用$stdout.flush ,并调用$stdin.readline来获取一行。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM