[英]Convert a list of fields to structtype object which is a SparkR schema
我們必須在 SparkR 中將 dataframe 的模式作為 StructType 並作為字段列表列出,例如:
str(schema)
#List of 2
# $ jobj :Class 'jobj' <environment: 0x563114ff5900>
# $ fields:function ()
# - attr(*, "class")= chr "structType"
schema <- schema(output_count)
fields <- schema$fields()
fields
#[[1]]
#StructField(name = "word", type = "StringType", nullable = TRUE)
#[[2]]
#StructField(name = "count", type = "StringType", nullable = TRUE)
我發現SparkR API暴露了一個方法: https://spark.apache.org/docs/2.0.0/api/R/
但不確定如何將它用作 SparkR 的初學者
我的嘗試:
schema <- schema(output_count)
str(schema)
#List of 2
# $ jobj :Class 'jobj' <environment: 0x563114ff5900>
# $ fields:function ()
# - attr(*, "class")= chr "structType"
我試着把它作為一個結構類型
如果我理解正確,那么下面的代碼至少會產生您在問題中解釋的 output 類型。
df <- SparkR::createDataFrame(iris)
lapply(SparkR::dtypes(df), function(x) SparkR::structField(x[1], x[2]))
output 是:
[[1]]
StructField(name = "Sepal_Length", type = "DoubleType", nullable = TRUE)
[[2]]
StructField(name = "Sepal_Width", type = "DoubleType", nullable = TRUE)
[[3]]
StructField(name = "Petal_Length", type = "DoubleType", nullable = TRUE)
[[4]]
StructField(name = "Petal_Width", type = "DoubleType", nullable = TRUE)
[[5]]
StructField(name = "Species", type = "StringType", nullable = TRUE)
如果您進一步將do.apply
與SparkR::structType
一起使用,
do.call(SparkR::structType, lapply(SparkR::dtypes(dd), function(x) SparkR::structField(x[1], x[2])))
那么 output 如下所示:
StructType
|-name = "Sepal_Length", type = "DoubleType", nullable = TRUE
|-name = "Sepal_Width", type = "DoubleType", nullable = TRUE
|-name = "Petal_Length", type = "DoubleType", nullable = TRUE
|-name = "Petal_Width", type = "DoubleType", nullable = TRUE
|-name = "Species", type = "StringType", nullable = TRUE
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.