简体   繁体   中英

Need to remove string "_value" in spark dataframe column names using scala

Have my dataframe as shown below.Here I have to remove the last occurrence of the string "_value" from all the column name of my dataframe.

import spark.implicits._
import org.apache.spark.sql.functions._
val simpledata = Seq(("file1","name1","101"),
("file1","name1","101"),
("file1","name1","101"),
("file1","name1","101"),
("file1","name1","101"))
val df = simpledata.toDF("filename_value","name_value_value","serialNo_value")
df.show()

Output menu enter image description here If I use replaceAll: val renamedColumnsDf = df.columns.map(c => df(c).as(c.replaceAll('_value',""))) it removes all the _values but i need only to remove the string based on last occurance.

Need help here to remove the string based on occurrence in column name.

My output should be:

      +--------------+----------------+--------------+
      |filename      |name_value      |serialNo      |
      +--------------+----------------+--------------+
      |         file1|           name1|           101|
      |         file1|           name1|           101|
      |         file1|           name1|           101|
      |         file1|           name1|           101|
      |         file1|           name1|           101|
      +--------------+----------------+--------------+

If you wish to remove the _value substring only if it is the suffix of the column name, you can do the following:

  val simpleDf: DataFrame = simpledata.toDF("filename_value", "name_value_value", "serialNo_value")

  val suffix: String = "_value"
  val renamedDf: DataFrame = simpleDf.columns.foldLeft(simpleDf) { (df, c) =>
    if (c.endsWith(suffix)) df.withColumnRenamed(c, c.substring(0, c.length - suffix.length)) else df}
  renamedDf.show()

The output will be:

+--------+----------+--------+
|filename|name_value|serialNo|
+--------+----------+--------+
|   file1|     name1|     101|
|   file1|     name1|     101|
|   file1|     name1|     101|
|   file1|     name1|     101|
|   file1|     name1|     101|
+--------+----------+--------+

Why bother complicated coding? You can use pattern matching on the column name inside your map transformation:

val newName = columnName match {
  case s"${something}_value" => something
  case other => other
}

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM