簡體   English   中英

帶有rails CSV的MalformedCSVError(FasterCSV)

[英]MalformedCSVError with rails CSV (FasterCSV)

我現在正試圖在rails中解析一些CSV時遇到嚴重問題。 基本上我的應用程序讓用戶上傳CSV文件。 然后,應用程序轉換文件以確保它是UTF-8格式,然后嘗試解析並處理它。 每當應用程序嘗試解析它時,我會收到MalformedCSVError,指出“第1行非法引用”

現在我不知道的是,如果我將原始文件復制到一個新文檔並保存,那么我可以在rails控制台上解析它而不會出現問題。

如果我嘗試解析原始文件,它會抱怨UTF-8編碼的字符無效(該文件不是UTF-8,因此應用程序將其轉換)

如果我嘗試解析應用已轉換為UTF-8的文件並將行結尾更改為LF,則無法解析。

如果我在應用程序生成的版本和我制作的復制/粘貼版本(有效)之間進行文件差異,則有0個差異,所以我真的無法弄清楚為什么一個是可解析的,而另一個則不是。

有什么建議么? 我的應用正在處理文件如下:

def create
@survey = Survey.new(params[:survey])

# Now we need to try and convert this to UTF-8 if it isn't already
 encoded = File.read(@survey.survey_data.current_path)
encoding = CharlockHolmes::EncodingDetector.detect(encoded)

# We've got a guess at the encoding, 
# so we can try and convert it but it 
# may still fail so we need to handle 
# that
begin
  re_encoded = CharlockHolmes::Converter.convert(encoded, encoding[:encoding], 'UTF-8')
  re_encoded = re_encoded.gsub(/\r\n?/, "\n")

  # Now replace the uploaded file
  File.open(@survey.survey_data.current_path, 'w') { |f|
    f.write(re_encoded)
  }
rescue ArgumentError
  puts "UH OH!!!!!"
end

puts "#{@survey.survey_data.current_path}"
@parsed = CSV.read(@survey.survey_data.current_path)

結束

文件上傳寶石是CarrierWave,如果這有任何區別。

請有人幫助我,因為這讓我瘋了!

編輯

錯誤表明它在第1行。第1行(假設它不是從0索引)是

"Survey","RD","GarrysMDs","NigelsMDs","PaulsMDs","StephensMDs","BrinleyJ","CarolineP","DaveL","GrantR","GregS","Kent","NeilC","NicolaP","AndyC","DarrenS","DeanB","KarenF","PaulR","RichardF","SteveG","BrianG","GordonA","NickD","NickR","NickT","RayL","SimonH","EdmondH","JasonF","MikeS","SamanthaN","TimB","TravisF","AlanS","Q1","Q2","Q3","Q4","Q5","Q6","Q7","Q8PM","Q8N","Q9","Q10","Q11","Q12","Q13","Q14","Q15","Q16PM","Q16N","Q17PM","Q17N","Q18PM","Q18N","Q19","Q20","Q21","Q22","comment","Q23.1","Q23.2","Q23.3","TQ23.1","TQ23.2","VPM","VN","VQ1","VQ2","VQ3","VQ4","VQ5","VQ6","VQ7","VQ8N","VQ8PM","VQ9","VQ10","VQ11","VQ12","VQ13","VQ14","VQ15","VQ16","VQ16N","VQ16PM","VQ17","VQ17N","VQ17PM","VQ18","VQ18N","VQ18PM","VQ19","VQ20","VQ21","VQ22","VQ23.1","VQ23.2","VQ23.3","VRD","XQ16","XQ17","XQ18"

那太刺激了!

原來該文件有一個BOM導致CSV解析器中斷。 加載文件

CSV.open("path/to/file.csv", "rb:bom|encoding")

允許它完美地解析它! 所以很煩惱追蹤它需要多長時間,但它現在正在工作,現在也不需要轉換為UTF-8!

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM