简体   繁体   English

zipWithIndex rdd具有初始值

[英]zipWithIndex rdd with initial value

I have a RDD like this: 我有一个这样的RDD:

+----------+--------+
|firstName |lastName|
+----------+--------+
|      john|   smith|
|      anna|  tourde|
+----------+--------+

I wouldLike to create a new column as we can do with zipWithIndex but giving and initial value of 8. 我会像创建zipWithIndex一样创建一个新列,但给定和初始值为8。

+----------+--------+-----+
|firstName |lastName|index|
+----------+--------+-----+
|      john|   smith|    8|
|      anna|  tourde|    9|
+----------+--------+-----+

Do you have any idea? 你有什么主意吗? Thanks 谢谢

rdd.zipWithIndex().map { case (v, ind) =>
  (v, ind + 8)
}

use zipWithIndex and convert back to dataframe as below 使用zipWithIndex并转换回数据zipWithIndex ,如下所示

val df1 = spark.sqlContext.createDataFrame(
    df.rdd.zipWithIndex.map {
  case (row, index) => Row.fromSeq(row.toSeq :+ index + 8)
},
// Create schema for index column
StructType(df.schema.fields :+ StructField("index", LongType, false)))

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM