如何将数组[行]转换为DataFrame

Question

How do I convert this one row to a dataframe? 如何将这一行转换为数据帧？

val oneRowDF = myDF.first // gives Array[Row]

Thanks 谢谢

Answer 1

In my answer, df1 is a DataFrame [text: string, y : int], just for testing - val df1 = sc.parallelize(List("a", 1")).toDF("text", "y") . 在我的回答中，df1是一个DataFrame [text：string，y：int]，仅用于测试 - val df1 = sc.parallelize(List("a", 1")).toDF("text", "y") 。

val schema = StructType(
    StructField("text", StringType, false) ::
    StructField("y", IntegerType, false) :: Nil)
val arr = df1.head(3); // Array[Row]
val dfFromArray = sqlContext.createDataFrame(sparkContext.parallelize(arr), schema);

You can also map parallelized array and cast every row: 您还可以映射并行化数组并转换每一行：

val dfFromArray = sparkContext.parallelize(arr).map(row => (row.getString(0), row.getInt(1)))
    .toDF("text", "y");

In case of one row, you can run: 如果是一行，您可以运行：

val dfFromArray = sparkContext.parallelize(Seq(row)).map(row => (row.getString(0), row.getInt(1)))
    .toDF("text", "y");

In Spark 2.0 use SparkSession instead of SQLContext. 在Spark 2.0中使用SparkSession而不是SQLContext。

Answer 2

You do not want to do that : 你不想这样做：

If you want a subpart of the whole dataFrame just use limit api. 如果你想要整个dataFrame的子部分，只需使用limit api。

Example: 例：

scala> val d=sc.parallelize(Seq((1,3),(2,4))).toDF
d: org.apache.spark.sql.DataFrame = [_1: int, _2: int]

scala> d.show
+---+---+
| _1| _2|
+---+---+
|  1|  3|
|  2|  4|
+---+---+


scala> d.limit(1)
res1: org.apache.spark.sql.Dataset[org.apache.spark.sql.Row] = [_1: int, _2: int]

scala> d.limit(1).show
+---+---+
| _1| _2|
+---+---+
|  1|  3|
+---+---+

Still if you want to explicitly convert an Array[Row] to DataFrame , you can do something like 如果你想显式地将Array [Row]转换为DataFrame，你可以做类似的事情

scala> val value=d.take(1)
value: Array[org.apache.spark.sql.Row] = Array([1,3])

scala> val asTuple=value.map(a=>(a.getInt(0),a.getInt(1)))
asTuple: Array[(Int, Int)] = Array((1,3))

scala> sc.parallelize(asTuple).toDF
res6: org.apache.spark.sql.DataFrame = [_1: int, _2: int]

And hence now you can show it accordingly ! 因此，现在你可以相应地显示它！

Answer 3

Take a look at the scaladocs - I'd recommend RDD[Row] here, which means you need to get there. 看看scaladocs - 我在这里推荐RDD[Row] ，这意味着你需要到达那里。 Should be easiest with makeRDD . makeRDD应该是最简单的。 You'll also need a schema corresponding to your Row , Which you can directly pull from it . 您还需要一个与您的Row相对应的模式，您可以直接从中获取该模式。

... how did you get Array[Row] in the first place? ...你是怎么得到Array[Row]的？

Answer 4

If you have List<Row> , then it can directly be used to create a dataframe or dataset<Row> using spark.createDataFrame(List<Row> rows, StructType schema) . 如果您有List<Row> ，那么它可以直接用于使用spark.createDataFrame(List<Row> rows, StructType schema)创建dataframe spark.createDataFrame(List<Row> rows, StructType schema)或dataset<Row> 。 Where spark is SparkSession in spark 2.x Spark 2.x中Spark是SparkSession的地方

如何将数组[行]转换为DataFrame

问题描述

4 个解决方案

解决方案1
9 2016-11-25 09:32:13

解决方案2
2 2016-11-25 09:01:25

解决方案3
0 2016-11-25 08:59:28

解决方案4
0 2018-02-01 08:47:10

如何将数组[行]转换为DataFrame

问题描述

4 个解决方案

解决方案1 9 2016-11-25 09:32:13

解决方案2 2 2016-11-25 09:01:25

解决方案3 0 2016-11-25 08:59:28

解决方案4 0 2018-02-01 08:47:10

解决方案1
9 2016-11-25 09:32:13

解决方案2
2 2016-11-25 09:01:25

解决方案3
0 2016-11-25 08:59:28

解决方案4
0 2018-02-01 08:47:10