使用 spark-shell 在 csv 文件中转义逗号

Question

I have a dataset containing below two rows我有一个包含以下两行的数据集

s.no,name,Country
101,xyz,India,IN
102,abc,UnitedStates,US

I am trying to escape the commas of each column but not for last column I want them the same and get the output using spark-shell.我试图转义每列的逗号但不是最后一列我希望它们相同并使用 spark-shell 获取输出。 I tried using the below code but it has given me the different output.我尝试使用下面的代码，但它给了我不同的输出。

val df = sqlContext.read.format("com.databricks.spark.csv").option("header", "true").option("delimiter", ",").option("escape", "\"").load("/user/username/data.csv").show()

The output it has given me is它给我的输出是

+----+-----+------------+
|s.no| name|     Country|
+----+-----+------------+
| 101|  xyz|       India|
| 102|  abc|UnitedStates|
+----+-----+------------+

But I am expecting output to be like below What I am missing here can anyone help me?但我希望输出如下所示我在这里缺少的东西有人可以帮助我吗？

s.no name Country

101 xyz India,IN

102 abc UnitedStates,US

Answer 1

I suggest to read the all the fields with providing schema and ignoring the header present in data as below我建议read所有提供schema并忽略数据中存在的标题的字段，如下所示

case class Data (sno: String, name: String, country: String, country1: String)

val schema = Encoders.product[Data].schema

import spark.implicits._

val df = spark.read
  .option("header", true)
  .schema(schema)
  .csv("data.csv")
  .withColumn("Country" , concat ($"country", lit(", "), $"country1"))
  .drop("country1")

df.show(false)

Output:输出：

+---+----+----------------+
|sno|name|Country         |
+---+----+----------------+
|101|xyz |India, IN       |
|102|abc |UnitedStates, US|
+---+----+----------------+

Hope this helps!希望这可以帮助！

使用 spark-shell 在 csv 文件中转义逗号

问题描述

1 个解决方案

解决方案1
1 已采纳 2019-02-27 14:10:43

使用 spark-shell 在 csv 文件中转义逗号

问题描述

1 个解决方案

解决方案1 1 已采纳 2019-02-27 14:10:43

解决方案1
1 已采纳 2019-02-27 14:10:43