Ruby文件解析每條記錄x行

Question

我有一個文本文件要解析。 在此文件中，每個記錄的內容分布在可變數量的行上。 每條記錄的行數不是固定的。 該文件的內容如下所示：

ID\tcontent\tcontent
\tcontent\tcontent
\tcontent\tcontent
\tcontent\tcontent
ID\tcontent\tcontent
\tcontent\tcontent
\tcontent\tcontent
\tcontent\tcontent
\tcontent\tcontent
\tcontent\tcontent
\tcontent\tcontent
\tcontent\tcontent
ID\tcontent\tcontent
\tcontent\tcontent

我想將其切片在第一個選項卡列中有記錄的位置（在以下各行中ID列為空，因此這種確定新記錄的方法應該可以使用）。

我當前的代碼將其分成五行，然后合並：

f = File.read(file).each_line
f.each_slice(5) do | slice_to_handle |
  merged_row = slice_to_handle.delete("\n").split("\t").collect(&:strip)
  # Dealing with the data here..
end

我需要對其進行修改以在第一列中設置ID后立即對其進行切片。

Answer 1

File.read(file)
.split(/^(?!\t)/)
.map{|record| record.split("\t").map(&:strip)}

結果

[
  [
    "ID",
    "content",
    "content",
    "content",
    "content",
    "content",
    "content",
    "content",
    "content"
  ],
  [
    "ID",
    "content",
    "content",
    "content",
    "content",
    "content",
    "content",
    "content",
    "content",
    "content",
    "content",
    "content",
    "content",
    "content",
    "content",
    "content",
    "content"
  ],
  [
    "ID",
    "content",
    "content",
    "content",
    "content"
  ]
]

Answer 2

Ruby的Array繼承自Enumerable，它具有slice_before ，是您的朋友：

text_file = "ID\tcontent\tcontent
\tcontent\tcontent
\tcontent\tcontent
\tcontent\tcontent
ID\tcontent\tcontent
\tcontent\tcontent
\tcontent\tcontent
\tcontent\tcontent
\tcontent\tcontent
\tcontent\tcontent
\tcontent\tcontent
\tcontent\tcontent
ID\tcontent\tcontent
\tcontent\tcontent".split("\n")

text_file.slice_before(/^ID/).map(&:join)

看起來像：

[
  "ID\tcontent\tcontent\tcontent\tcontent\tcontent\tcontent\tcontent\tcontent",
  "ID\tcontent\tcontent\tcontent\tcontent\tcontent\tcontent\tcontent\tcontent\tcontent\tcontent\tcontent\tcontent\tcontent\tcontent\tcontent\tcontent",
  "ID\tcontent\tcontent\tcontent\tcontent"
]

text_file是一個由行組成的數組，類似於使用readlines抓取文件所得到的內容。

slice_before遍歷數組以查找與/^ID/模式的匹配項，並在每次找到時創建一個新的子數組。

map(&:join)遍歷子數組，並將其內容連接到單個字符串中。

不過，這不是很可擴展。 使用它，您將依賴於將整個文件插入到內存中，這可以使機器停止運行。 取而代之的是，最好逐行讀取內容並打破障礙並盡快對其進行處理。

Ruby文件解析每條記錄x行

問題描述

2 個解決方案

解決方案1
0 2013-07-09 12:41:39

解決方案2
0 已采納 2013-07-09 16:15:14

Ruby文件解析每條記錄x行

問題描述

2 個解決方案

解決方案1 0 2013-07-09 12:41:39

解決方案2 0 已采納 2013-07-09 16:15:14

解決方案1
0 2013-07-09 12:41:39

解決方案2
0 已采納 2013-07-09 16:15:14