簡體   English   中英

將字段列表轉換為結構類型 object,這是一個 SparkR 模式

[英]Convert a list of fields to structtype object which is a SparkR schema

我們必須在 SparkR 中將 dataframe 的模式作為 StructType 並作為字段列表列出,例如:

str(schema)
#List of 2
# $ jobj  :Class 'jobj' <environment: 0x563114ff5900> 
# $ fields:function ()  
# - attr(*, "class")= chr "structType"

schema <- schema(output_count)
 
fields <- schema$fields()

fields
#[[1]]
#StructField(name = "word", type = "StringType", nullable = TRUE)
#[[2]]
#StructField(name = "count", type = "StringType", nullable = TRUE)

我發現SparkR API暴露了一個方法: https://spark.apache.org/docs/2.0.0/api/R/

但不確定如何將它用作 SparkR 的初學者

我的嘗試:

schema <- schema(output_count)
str(schema)
#List of 2
# $ jobj  :Class 'jobj' <environment: 0x563114ff5900> 
# $ fields:function ()  
# - attr(*, "class")= chr "structType"

我試着把它作為一個結構類型

如果我理解正確,那么下面的代碼至少會產生您在問題中解釋的 output 類型。

df <- SparkR::createDataFrame(iris)
lapply(SparkR::dtypes(df), function(x) SparkR::structField(x[1], x[2]))

output 是:

[[1]] 
StructField(name = "Sepal_Length", type = "DoubleType", nullable = TRUE)
[[2]] 
StructField(name = "Sepal_Width", type = "DoubleType", nullable = TRUE)
[[3]] 
StructField(name = "Petal_Length", type = "DoubleType", nullable = TRUE)
[[4]] 
StructField(name = "Petal_Width", type = "DoubleType", nullable = TRUE)
[[5]] 
StructField(name = "Species", type = "StringType", nullable = TRUE)

如果您進一步將do.applySparkR::structType一起使用,

do.call(SparkR::structType, lapply(SparkR::dtypes(dd), function(x) SparkR::structField(x[1], x[2])))

那么 output 如下所示:

StructType
|-name = "Sepal_Length", type = "DoubleType", nullable = TRUE
|-name = "Sepal_Width", type = "DoubleType", nullable = TRUE
|-name = "Petal_Length", type = "DoubleType", nullable = TRUE
|-name = "Petal_Width", type = "DoubleType", nullable = TRUE
|-name = "Species", type = "StringType", nullable = TRUE

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM