Column name with dot spark

Question

I am trying to take columns from a DataFrame and convert it to an RDD[Vector] .

The problem is that I have columns with a "dot" in their name as the following dataset :

"col0.1","col1.2","col2.3","col3.4"
1,2,3,4
10,12,15,3
1,12,10,5

This is what I'm doing :

val df = spark.read.format("csv").options(Map("header" -> "true", "inferSchema" -> "true")).load("C:/Users/mhattabi/Desktop/donnee/test.txt")
val column=df.columns.map(c=>s"`${c}`")
val rows = new VectorAssembler().setInputCols(column).setOutputCol("vs")
  .transform(df)
  .select("vs")
  .rdd
val data =rows.map(_.getAs[org.apache.spark.ml.linalg.Vector](0))
  .map(org.apache.spark.mllib.linalg.Vectors.fromML)

val mat: RowMatrix = new RowMatrix(data)
//// Compute the top 5 singular values and corresponding singular vectors.
val svd: SingularValueDecomposition[RowMatrix, Matrix] = mat.computeSVD(mat.numCols().toInt, computeU = true)
val U: RowMatrix = svd.U  // The U factor is a RowMatrix.
val s: Vector = svd.s  // The singular values are stored in a local dense vector.
val V: Matrix = svd.V  // The V factor is a local dense matrix.

println(V)

Please any help to get me consider columns with dot in their names.Thanks

Answer 1

如果您的问题是列名中的.(dot) ，您可以使用`(backticks)将列名括起来。

df.select("`col0.1`")

Answer 2

The problem here is VectorAssembler implementation, not the columns per se. You can for example skip the header:

val df = spark.read.format("csv")
  .options(Map("inferSchema" -> "true", "comment" -> "\""))
  .load(path)

new VectorAssembler()
  .setInputCols(df.columns)
  .setOutputCol("vs")
  .transform(df)

or rename columns before passing to VectorAssembler :

val renamed =  df.toDF(df.columns.map(_.replace(".", "_")): _*)

new VectorAssembler()
  .setInputCols(renamed.columns)
  .setOutputCol("vs")
  .transform(renamed)

Finally the best approach is to provide schema explicitly:

import org.apache.spark.sql.types._

val schema = StructType((0 until 4).map(i => StructField(s"_$i", DoubleType)))

val dfExplicit = spark.read.format("csv")
  .options(Map("header" -> "true"))
  .schema(schema)
  .load(path)

new VectorAssembler()
  .setInputCols(dfExplicit.columns)
  .setOutputCol("vs")
  .transform(dfExplicit)

Answer 3

对于 Spark SQL

spark.sql("select * from reg_data where reg_data.`createdResource.type` = 'Berlin'")

Column name with dot spark

Question

3 answers

solution1
50 2018-07-19 19:18:45

solution2
12 2017-06-05 11:35:36

solution3
0 2020-12-30 13:31:38

Column name with dot spark

Question

3 answers

solution1 50 2018-07-19 19:18:45

solution2 12 2017-06-05 11:35:36

solution3 0 2020-12-30 13:31:38

solution1
50 2018-07-19 19:18:45

solution2
12 2017-06-05 11:35:36

solution3
0 2020-12-30 13:31:38