簡體   English   中英

如何在 Ruby 中拆分 CSV 字符串?

[英]How do I split apart a CSV string in Ruby?

我以 CSV 文件中的這一行為例:

2412,21,"Which of the following is not found in all cells?","Curriculum","Life and Living Processes, Life Processes",,,1,0,"endofline"

我想把它分成一個數組。 直接的想法是只用逗號分割,但一些字符串中包含逗號,例如“生命和生命過程,生命過程”,這些應該作為數組中的單個元素保留。 另請注意,有兩個逗號中間沒有任何內容 - 我想將它們作為空字符串獲取。

換句話說,我想得到的數組是

[2412,21,"Which of the following is not found in all cells?","Curriculum","Life and Living Processes, Life Processes","","",1,0,"endofline"]

我可以想到涉及 eval 的 hacky 方法,但我希望有人能想出一個干凈的正則表達式來做到這一點......

干杯,最大

這不是正則表達式的合適任務。 需要一個 CSV 解析器,而 Ruby 內置了一個:

http://ruby-doc.org/stdlib/libdoc/csv/rdoc/classes/CSV.html

還有一個可以說是優越的第三部分庫:

http://fastercsv.rubyforge.org/

str=<<EOF
2412,21,"Which of the following is not found in all cells?","Curriculum","Life and Living Processes, Life Processes",,,1,0,"endofline"
EOF
require 'csv' # built in

p CSV.parse(str)
# That's it! However, empty fields appear as nil.
# Makes sense to me, but if you insist on empty strings then do something like:
parser = CSV.new(str)
parser.convert{|field| field.nil? ? "" : field}
p parser.readlines

編輯:我未能閱讀 Ruby 標簽。 好消息是,該指南將解釋構建它背后的理論,即使語言細節不正確。 對不起。

這是執行此操作的絕佳指南:

http://knab.ws/blog/index.php?/archives/10-CSV-file-parser-and-writer-in-C-Part-2.html

和 csv 作家在這里:

http://knab.ws/blog/index.php?/archives/3-CSV-file-parser-and-writer-in-C-Part-1.html

這些示例涵蓋了在 csv(可能包含也可能不包含逗號)中引用文字的情況。

text=<<EOF
2412,21,"Which of the following is not found in all cells?","Curriculum","Life and Living Processes, Life Processes",,,1,0,"endofline"
EOF
x=[]
text.chomp.split("\042").each_with_index do |y,i|
  i%2==0 ?  x<< y.split(",") : x<<y
end
print x.flatten

輸出

$ ruby test.rb
["2412", "21", "Which of the following is not found in all cells?", "Curriculum", "Life and Living Processes, Life Processes", "", "", "", "1", "0", "endofline"]

今天早上我偶然發現了一個用於 Ruby-on-Rails 的 CSV 表導入器項目。 最終你會發現代碼很有幫助:

Github 表導入器

我的偏好是@steenstag 的解決方案,但另一種方法是使用String#scan和以下正則表達式。

r = /(?<![^,])(?:(?!")[^,\n]*(?<!")|"[^"\n]*")(?![^,])/

如果變量str保存示例中給出的字符串,我們將獲得:

puts str.scan r

顯示

2412
21
"Which of the following is not found in all cells?"
"Curriculum"
"Life and Living Processes, Life Processes"


1
0
"endofline"

啟動你的引擎!

另請參閱regex101 ,它提供了正則表達式的每個標記的詳細說明。 (在正則表達式上移動光標。)

Ruby 的正則表達式引擎執行以下操作。

(?<![^,]) : negative lookbehind assert current location is not preceded
            by a character other than a comma
(?:       : begin non-capture group
  (?!")   : negative lookahead asserts next char is not a double-quote
  [^,\n]* : match 0+ chars other than a comma and newline
  (?<!")  : negative lookbehind asserts preceding character is not a
            double-quote
  |       : or
  "       : match double-quote
  [^"\n]* : match 0+ chars other than double-quote and newline
  "       : match double-quote
)         : end of non-capture group
(?![^,])  : negative lookahead asserts current location is not followed
            by a character other than a comma

請注意, (?<![^,])(?<=,|^)(?![^,])(?=^|,)

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM