![](/img/trans.png)
[英]How to create a Dataset of String from a Dataset of List of String Spark Java
[英]How to load dataset from String in spark
從 spark 的文檔中,我知道我可以從文件中加載libsvm-formatted
數據集。
但是,我想在遠程 spark 集群中運行代碼,所以我將 iris 數據集硬編碼到我的代碼中,並且我想直接從這個 String object 加載。
但是,當查看DataFrameReader object 時,我發現沒有 API 支持從String
直接加載數據集。
我試過這樣-
val irisData =
"""
|"sepal_length","sepal_width","petal_length","petal_width","label"
|5.1,3.5,1.4,0.2,Iris-setosa
|4.9,3.0,1.4,0.2,Iris-setosa
|4.7,3.2,1.3,0.2,Iris-setosa
|4.6,3.1,1.5,0.2,Iris-setosa
""".stripMargin
println(irisData)
"sepal_length","sepal_width","petal_length","petal_width","label"
5.1,3.5,1.4,0.2,Iris-setosa
4.9,3.0,1.4,0.2,Iris-setosa
4.7,3.2,1.3,0.2,Iris-setosa
4.6,3.1,1.5,0.2,Iris-setosa
val stringDS = spark.createDataset(irisData.split("\n"))(Encoders.STRING)
val irisDatasetDF = spark.read
.option("inferSchema", "true")
.option("header", "true")
.csv(stringDS)
irisDatasetDF.show(false)
+------------+-----------+------------+-----------+-----------+
|sepal_length|sepal_width|petal_length|petal_width|label |
+------------+-----------+------------+-----------+-----------+
|5.1 |3.5 |1.4 |0.2 |Iris-setosa|
|4.9 |3.0 |1.4 |0.2 |Iris-setosa|
|4.7 |3.2 |1.3 |0.2 |Iris-setosa|
|4.6 |3.1 |1.5 |0.2 |Iris-setosa|
+------------+-----------+------------+-----------+-----------+
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.