忽略CSV中的多个标题行

Question

我已经使用了Ruby的CSV模块，但在使其忽略多个标题行时遇到了一些问题。

具体来说，以下是我要解析的文件的前二十行：

USGS Digital Spectral Library splib06a
Clark and others 2007, USGS, Data Series 231.

For further information on spectrsocopy, see: http://speclab.cr.usgs.gov

ASCII Spectral Data file contents:
line 15 title
line 16 history
line 17 to end:  3-columns of data:
     wavelength    reflectance    standard deviation

(standard deviation of 0.000000 means not measured)
(      -1.23e34  indicates a deleted number)
----------------------------------------------------
Olivine GDS70.a Fo89 165um   W1R1Bb AREF
copy of splib05a r 5038
       0.205100      -1.23e34        0.090781
       0.213100      -1.23e34        0.018820
       0.221100      -1.23e34        0.005416
       0.229100      -1.23e34        0.002928

实际的标头在第十行给出，而第十七行是实际数据的起始位置。

这是我的代码：

require "nyaplot"

# Note that DataFrame basically just inherits from Ruby's CSV module.
class SpectraHelper < Nyaplot::DataFrame
  class << self
    def from_csv filename
      df = super(filename, col_sep: ' ') do |csv|
        csv.convert do |field, info|
          STDERR.puts "Field is #{field}"
        end
      end
    end
  end

  def csv_headers
    [:wavelength, :reflectance, :standard_deviation]
  end
end


def read_asc filename
  f = File.open(filename, "r")
  16.times do
    line = f.gets
    puts "Ignoring #{line}"
  end

  d = SpectraHelper.from_csv(f)
end

输出表明我对f.gets调用实际上并没有忽略那些行，而且我不明白为什么。 这是输出的前几行：

Field is Clark
Field is and
Field is others
Field is 2007,
Field is USGS,

我试图寻找一个教程或示例，该示例或示例显示了处理更复杂的CSV文件的过程，但是运气不高。 如果有人可以将我引向可以回答这个问题的资源，我将不胜感激（并希望将其标记为对我特定问题的解决方案已被接受，但都将受到赞赏）。

使用Ruby 2.1。

Answer 1

它认为您正在使用::open ，而后者使用IO.open 。 此方法将再次打开文件。

我修改了一下脚本

require 'csv'

class SpectraHelper < CSV
  def self.from_csv(filename)
    df = open(filename, 'r' , col_sep: ' ') do |csv|
      csv.drop(16).each {|c| p c}
    end
  end
end

def read_asc(filename)
  SpectraHelper.from_csv(filename)
end

read_asc "data/csv1.csv"

Answer 2

原来这里的问题不是我对CSV的理解，而是现在Nyaplot::DataFrame处理CSV文件。

基本上，Nyaplot实际上并不将事物存储为CSV。 CSV只是一种中间格式。 因此，一种处理文件的简单方法是利用@khelli的建议：

def read_asc filename
  Nyaplot::DataFrame.new(CSV.open(filename, 'r',
     col_sep: ' ',
     headers: [:wavelength, :reflectance, :standard_deviation],
     converters: :numeric).
   drop(16).
   map do |csv_row|
    csv_row.to_h.delete_if { |k,v| k.nil? }
  end)
end

谢谢大家的建议。

Answer 3

由于您的文件格式不正确，因此我不会使用CSV模块。 以下代码将读取文件并为您提供记录数组：

  lines = File.open(filename,'r').readlines
  lines.slice!(0,16)
  records = lines.map {|line| line.chomp.split}

records输出：

[["0.205100", "-1.23e34", "0.090781"], ["0.213100", "-1.23e34", "0.018820"], ["0.221100", "-1.23e34", "0.005416"], ["0.229100", "-1.23e34", "0.002928"]]

忽略CSV中的多个标题行

问题描述

3 个解决方案

解决方案1
1 2014-10-19 19:49:00

解决方案2
0 已采纳 2014-10-21 15:43:55

解决方案3
-1 2014-10-19 19:57:38

忽略CSV中的多个标题行

问题描述

3 个解决方案

解决方案1 1 2014-10-19 19:49:00

解决方案2 0 已采纳 2014-10-21 15:43:55

解决方案3 -1 2014-10-19 19:57:38

解决方案1
1 2014-10-19 19:49:00

解决方案2
0 已采纳 2014-10-21 15:43:55

解决方案3
-1 2014-10-19 19:57:38