简体   繁体   English

Scala:对于数据框上的循环,从现有索引创建新列

[英]Scala: For loop on dataframe, create new column from existing by index

I have a dataframe with two columns: 我有两列的数据框:

id (string), date (timestamp) id(字符串),日期(时间戳)

I would like to loop through the dataframe, and add a new column with an url, which includes the id. 我想遍历数据框,并添加一个带有URL的新列,其中包括ID。 The algorithm should look something like this: 该算法应如下所示:

 add one new column with the following value:
 for each id
       "some url" + the value of the dataframe's id column

I tried to make this work in Scala, but I have problems with getting the specific id on the index of "a" 我试图在Scala中完成这项工作,但是在获取索引“ a”上的特定ID时遇到了问题

 val k = df2.count().asInstanceOf[Int]
      // for loop execution with a range
      for( a <- 1 to k){
         // println( "Value of a: " + a );
         val dfWithFileURL = dataframe.withColumn("fileUrl", "https://someURL/" + dataframe("id")[a])

      }

But this 但是这个

dataframe("id")[a] 数据帧( “ID”)[α]

is not working with Scala. 与Scala不兼容。 I could not find solution yet, so every kind of suggestions are welcome! 我还找不到解决方案,因此欢迎提出各种建议!

You can simply use the withColumn function in Scala, something like this: 您可以简单地在Scala中使用withColumn函数,如下所示:

val df = Seq(
  ( 1, "1 Jan 2000" ),
  ( 2, "2 Feb 2014" ),
  ( 3, "3 Apr 2017" )
)
  .toDF("id", "date" )


// Add the fileUrl column
val dfNew = df
  .withColumn("fileUrl", concat(lit("https://someURL/"), $"id"))
  .show

My results: 我的结果:

标量结果

Not sure if this is what you require but you can use zipWithIndex for indexing. 不知道这是否是您所需要的,但是可以使用zipWithIndex进行索引。

data.show()

+---+---------------+
| Id|            Url|
+---+---------------+
|111|http://abc.go.org/|
|222|http://xyz.go.net/|
+---+---------------+   

import org.apache.spark.sql._
val df = sqlContext.createDataFrame(
data.rdd.zipWithIndex
.map{case (r, i) => Row.fromSeq(r.toSeq:+(s"""${r.getString(1)}${i+1}"""))},
    StructType(data.schema.fields :+ StructField("fileUrl", StringType, false))
)                            

Output: 输出:

df.show(false)

+---+---------------+----------------+
|Id |Url            |fileUrl         |
+---+---------------+----------------+
|111|http://abc.go.org/|http://abc.go.org/1|
|222|http://xyz.go.net/|http://xyz.go.net/2|
+---+---------------+----------------+

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 从现有数据框创建新列 - Create new column from existing Dataframe Spark / scala-我们可以从数据框中的现有列值创建新列吗 - Spark/scala - can we create new columns from an existing column value in a dataframe 使用现有的 Integer 列在 Spark Scala ZC699575A5E8AFD9E22A7AECC1 - Create New Column with range of integer by using existing Integer Column in Spark Scala Dataframe Spark Scala Dataframe如何使用两个或多个现有列创建新列 - Spark Scala Dataframe How to create new column with two or more existing columns 关于如何在 Scala 中使用随机值向现有 DataFrame 添加新列 - About how to add a new column to an existing DataFrame with random values in Scala 如何为现有的 DataFrame 创建新行? 在 PySpark 或 Scala - How can I create new rows to the existing DataFrame? in PySpark or Scala 根据列数创建具有新行的新DataFrame-Spark Scala - Create new DataFrame with new rows depending in number of a column - Spark Scala 从另一个日期时间字段创建一个新的日期列 - spark scala dataframe - Create a new Date column from another datetime field - spark scala dataframe 有没有一种方法可以从Scala中的数据框的现有列创建多个列? - Is there a way to create multiple columns from existing columns of a dataframe in Scala? Scala-创建一个新列表并更新现有列表中的特定元素 - Scala - create a new list and update particular element from existing list
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM