Ruby文件解析每条记录x行

Question

我有一个文本文件要解析。 在此文件中，每个记录的内容分布在可变数量的行上。 每条记录的行数不是固定的。 该文件的内容如下所示：

ID\tcontent\tcontent
\tcontent\tcontent
\tcontent\tcontent
\tcontent\tcontent
ID\tcontent\tcontent
\tcontent\tcontent
\tcontent\tcontent
\tcontent\tcontent
\tcontent\tcontent
\tcontent\tcontent
\tcontent\tcontent
\tcontent\tcontent
ID\tcontent\tcontent
\tcontent\tcontent

我想将其切片在第一个选项卡列中有记录的位置（在以下各行中ID列为空，因此这种确定新记录的方法应该可以使用）。

我当前的代码将其分成五行，然后合并：

f = File.read(file).each_line
f.each_slice(5) do | slice_to_handle |
  merged_row = slice_to_handle.delete("\n").split("\t").collect(&:strip)
  # Dealing with the data here..
end

我需要对其进行修改以在第一列中设置ID后立即对其进行切片。

Answer 1

File.read(file)
.split(/^(?!\t)/)
.map{|record| record.split("\t").map(&:strip)}

结果

[
  [
    "ID",
    "content",
    "content",
    "content",
    "content",
    "content",
    "content",
    "content",
    "content"
  ],
  [
    "ID",
    "content",
    "content",
    "content",
    "content",
    "content",
    "content",
    "content",
    "content",
    "content",
    "content",
    "content",
    "content",
    "content",
    "content",
    "content",
    "content"
  ],
  [
    "ID",
    "content",
    "content",
    "content",
    "content"
  ]
]

Answer 2

Ruby的Array继承自Enumerable，它具有slice_before ，是您的朋友：

text_file = "ID\tcontent\tcontent
\tcontent\tcontent
\tcontent\tcontent
\tcontent\tcontent
ID\tcontent\tcontent
\tcontent\tcontent
\tcontent\tcontent
\tcontent\tcontent
\tcontent\tcontent
\tcontent\tcontent
\tcontent\tcontent
\tcontent\tcontent
ID\tcontent\tcontent
\tcontent\tcontent".split("\n")

text_file.slice_before(/^ID/).map(&:join)

看起来像：

[
  "ID\tcontent\tcontent\tcontent\tcontent\tcontent\tcontent\tcontent\tcontent",
  "ID\tcontent\tcontent\tcontent\tcontent\tcontent\tcontent\tcontent\tcontent\tcontent\tcontent\tcontent\tcontent\tcontent\tcontent\tcontent\tcontent",
  "ID\tcontent\tcontent\tcontent\tcontent"
]

text_file是一个由行组成的数组，类似于使用readlines抓取文件所得到的内容。

slice_before遍历数组以查找与/^ID/模式的匹配项，并在每次找到时创建一个新的子数组。

map(&:join)遍历子数组，并将其内容连接到单个字符串中。

不过，这不是很可扩展。 使用它，您将依赖于将整个文件插入到内存中，这可以使机器停止运行。 取而代之的是，最好逐行读取内容并打破障碍并尽快对其进行处理。

Ruby文件解析每条记录x行

问题描述

2 个解决方案

解决方案1
0 2013-07-09 12:41:39

解决方案2
0 已采纳 2013-07-09 16:15:14

Ruby文件解析每条记录x行

问题描述

2 个解决方案

解决方案1 0 2013-07-09 12:41:39

解决方案2 0 已采纳 2013-07-09 16:15:14

解决方案1
0 2013-07-09 12:41:39

解决方案2
0 已采纳 2013-07-09 16:15:14