繁体   English   中英

如何在 Spark/Scala 的 csv 文件中处理单元格中的逗号

[英]How to handle comma in a cell in csv file in Spark/Scala

读取csv时如何处理地址单元格中的逗号?

"node_id","name","address","country_codes","countries","sourceID","valid_until","note"

"14000008","","""Les Tattes""; Bursinel; Vaud; Switzerland","CHE","Switzerland","Panama Papers","Through 2015",""

"14000014","",""""Whingate"" Tower Hill Dummer, Nr Basingstoke; Hants RG25 2AL","GBR","United Kingdom","Panama Papers","Through 2015",""

"14000015","","#02-01; 14 MOHAMED SULTAN ROAD; SINGAPORE 238963","SGP","Singapore","Panama Papers","Through 2015",""

你可以使用一些花哨的东西,比如正则表达式:

  1. Splitter.on(Pattern.compile(",(?=(?:[^"] "[^"] ") [^"] $)"))

或者您可以尝试使用带有分隔符的 split() 函数:

scala> val s = "eggs, milk, butter, Coco Puffs"
s: java.lang.String = eggs, milk, butter, Coco Puffs

scala> s.split(",") //split function
res0: Array[java.lang.String] = Array(eggs, " milk", " butter", " Coco Puffs")

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM