简体   繁体   English

使用 scala 在 Spark DataFrame 中添加新行

[英]Add new rows in the Spark DataFrame using scala

I have a dataframe like:我有一个 dataframe 像:

Name_Index  City_Index
  2.0         1.0
  0.0         2.0
  1.0         0.0

I have a new list of values.我有一个新的值列表。

list(1.0,1.0)

I want to add these values to a new row in dataframe in the case that all previous rows are dropped.我想将这些值添加到 dataframe 中的新行,以防所有先前的行都被删除。

My code:我的代码:

 val spark = SparkSession.builder
      .master("local[*]")
      .config("spark.serializer", "org.apache.spark.serializer.KryoSerializer")
      .getOrCreate()


    var data = spark.read.option("header", "true")
      .option("inferSchema", "true")
      .csv("src/main/resources/student.csv")

   val someDF = Seq(
         (1.0,1.0)
        ).toDF("Name_Index","City_Index")

   data=data.union(someDF).show()

It show output like:它显示 output 如下:

Name_Index  City_Index
  2.0          1.0
  0.0          2.0
  1.0          0.0
  1.1          1.1

But output should be like this.但是output应该是这样的。 So that all the previous rows are dropped and new values are added.这样所有以前的行都被删除并添加了新值。

Name_Index   City_Index
  1.0          1.0

you can achieve this using limit & union functions.您可以使用限制和联合功能来实现这一点。 check below.检查下面。

scala> val df = Seq((2.0,1.0),(0.0,2.0),(1.0,0.0)).toDF("name_index","city_index")
df: org.apache.spark.sql.DataFrame = [name_index: double, city_index: double]

scala> df.show(false)
+----------+----------+
|name_index|city_index|
+----------+----------+
|2.0       |1.0       |
|0.0       |2.0       |
|1.0       |0.0       |
+----------+----------+


scala> val ndf = Seq((1.0,1.0)).toDF("name_index","city_index")
ndf: org.apache.spark.sql.DataFrame = [name_index: double, city_index: double]

scala> ndf.show
+----------+----------+
|name_index|city_index|
+----------+----------+
|       1.0|       1.0|
+----------+----------+


scala> df.limit(0).union(ndf).show(false) // this is not good approach., you can directly call ndf.show
+----------+----------+
|name_index|city_index|
+----------+----------+
|1.0       |1.0       |
+----------+----------+

change the last line to将最后一行更改为

data=data.except(data).union(someDF).show()

you could try this approach你可以试试这个方法

data = data.filter(_ => false).union(someDF)

output output

+----------+----------+
|Name_Index|City_Index|
+----------+----------+
|1.0       |1.0       |
+----------+----------+

I hope it gives you some insights.我希望它能给你一些见解。

Regards.问候。

As far as I can see, you only need the list of columns from source Dataframe.据我所知,您只需要源 Dataframe 中的列列表。

If your sequence has the same order of the columns as the source Dataframe does, you can re-use schema without actually querying the source Dataframe. Performance wise, it will be faster.如果您的序列具有与源 Dataframe 相同的列顺序,您可以重新使用架构而无需实际查询源 Dataframe。性能方面,它会更快。

    val srcDf = Seq((2.0,1.0),(0.0,2.0),(1.0,0.0)).toDF("name_index","city_index")

    val dstDf = Seq((1.0, 1.0)).toDF(srcDf.columns:_*)


暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何使用Scala / Spark 2.2将列添加到现有DataFrame并使用window函数在新列中添加特定行 - How to add a column to the existing DataFrame and using window function to add specific rows in the new column using Scala/Spark 2.2 将Spark数据框中的列拆分为新行[Scala] - Spliting columns in a Spark dataframe in to new rows [Scala] Scala Spark DataFrame 问题:如何通过将当前行中的值与前一行中的某处匹配来添加新列 - Scala Spark DataFrame Question:How to add new columns by matching the value in current row to somewhere from previous rows Spark Scala - 如何在数据框中迭代行,并将计算值添加为数据框的新列 - Spark Scala - How do I iterate rows in dataframe, and add calculated values as new columns of the data frame 根据列数创建具有新行的新DataFrame-Spark Scala - Create new DataFrame with new rows depending in number of a column - Spark Scala 使用 spark scala 向空数据帧添加一行 - Add a row to a empty dataframe using spark scala 如何使用Scala在Spark DataFrame中将每一行分成多行 - How to break each rows into multiple rows in Spark DataFrame using scala 合并spark scala Dataframe中的行 - Merge rows in a spark scala Dataframe 将 Map Datatype 的新列添加到 Scala 中的 Spark Dataframe - Add new column of Map Datatype to Spark Dataframe in scala 将具有文字值的新列添加到 Spark Scala 中 Dataframe 中的结构列 - Add new column with literal value to a struct column in Dataframe in Spark Scala
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM