[英]How do I split apart a CSV string in Ruby?
我以 CSV 文件中的這一行為例:
2412,21,"Which of the following is not found in all cells?","Curriculum","Life and Living Processes, Life Processes",,,1,0,"endofline"
我想把它分成一個數組。 直接的想法是只用逗號分割,但一些字符串中包含逗號,例如“生命和生命過程,生命過程”,這些應該作為數組中的單個元素保留。 另請注意,有兩個逗號中間沒有任何內容 - 我想將它們作為空字符串獲取。
換句話說,我想得到的數組是
[2412,21,"Which of the following is not found in all cells?","Curriculum","Life and Living Processes, Life Processes","","",1,0,"endofline"]
我可以想到涉及 eval 的 hacky 方法,但我希望有人能想出一個干凈的正則表達式來做到這一點......
干杯,最大
這不是正則表達式的合適任務。 您需要一個 CSV 解析器,而 Ruby 內置了一個:
http://ruby-doc.org/stdlib/libdoc/csv/rdoc/classes/CSV.html
還有一個可以說是優越的第三部分庫:
str=<<EOF
2412,21,"Which of the following is not found in all cells?","Curriculum","Life and Living Processes, Life Processes",,,1,0,"endofline"
EOF
require 'csv' # built in
p CSV.parse(str)
# That's it! However, empty fields appear as nil.
# Makes sense to me, but if you insist on empty strings then do something like:
parser = CSV.new(str)
parser.convert{|field| field.nil? ? "" : field}
p parser.readlines
編輯:我未能閱讀 Ruby 標簽。 好消息是,該指南將解釋構建它背后的理論,即使語言細節不正確。 對不起。
這是執行此操作的絕佳指南:
http://knab.ws/blog/index.php?/archives/10-CSV-file-parser-and-writer-in-C-Part-2.html
和 csv 作家在這里:
http://knab.ws/blog/index.php?/archives/3-CSV-file-parser-and-writer-in-C-Part-1.html
這些示例涵蓋了在 csv(可能包含也可能不包含逗號)中引用文字的情況。
text=<<EOF
2412,21,"Which of the following is not found in all cells?","Curriculum","Life and Living Processes, Life Processes",,,1,0,"endofline"
EOF
x=[]
text.chomp.split("\042").each_with_index do |y,i|
i%2==0 ? x<< y.split(",") : x<<y
end
print x.flatten
輸出
$ ruby test.rb
["2412", "21", "Which of the following is not found in all cells?", "Curriculum", "Life and Living Processes, Life Processes", "", "", "", "1", "0", "endofline"]
今天早上我偶然發現了一個用於 Ruby-on-Rails 的 CSV 表導入器項目。 最終你會發現代碼很有幫助:
我的偏好是@steenstag 的解決方案,但另一種方法是使用String#scan和以下正則表達式。
r = /(?<![^,])(?:(?!")[^,\n]*(?<!")|"[^"\n]*")(?![^,])/
如果變量str
保存示例中給出的字符串,我們將獲得:
puts str.scan r
顯示
2412
21
"Which of the following is not found in all cells?"
"Curriculum"
"Life and Living Processes, Life Processes"
1
0
"endofline"
另請參閱regex101 ,它提供了正則表達式的每個標記的詳細說明。 (在正則表達式上移動光標。)
Ruby 的正則表達式引擎執行以下操作。
(?<![^,]) : negative lookbehind assert current location is not preceded
by a character other than a comma
(?: : begin non-capture group
(?!") : negative lookahead asserts next char is not a double-quote
[^,\n]* : match 0+ chars other than a comma and newline
(?<!") : negative lookbehind asserts preceding character is not a
double-quote
| : or
" : match double-quote
[^"\n]* : match 0+ chars other than double-quote and newline
" : match double-quote
) : end of non-capture group
(?![^,]) : negative lookahead asserts current location is not followed
by a character other than a comma
請注意, (?<![^,])
與(?<=,|^)
, (?![^,])
與(?=^|,)
。
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.