簡體   English   中英

如何在火花中從字符串加載數據集

[英]How to load dataset from String in spark

從 spark 的文檔中,我知道我可以從文件中加載libsvm-formatted數據集。

但是,我想在遠程 spark 集群中運行代碼,所以我將 iris 數據集硬編碼到我的代碼中,並且我想直接從這個 String object 加載。

但是,當查看DataFrameReader object 時,我發現沒有 API 支持從String直接加載數據集。

我試過這樣-

 val irisData =                                                                                           
   """                                                                                                    
     |"sepal_length","sepal_width","petal_length","petal_width","label"                                   
     |5.1,3.5,1.4,0.2,Iris-setosa                                                                         
     |4.9,3.0,1.4,0.2,Iris-setosa                                                                         
     |4.7,3.2,1.3,0.2,Iris-setosa                                                                         
     |4.6,3.1,1.5,0.2,Iris-setosa                                                                         
   """.stripMargin                                                                                        

 println(irisData)                                                                                        

   "sepal_length","sepal_width","petal_length","petal_width","label"                                      
   5.1,3.5,1.4,0.2,Iris-setosa                                                                            
   4.9,3.0,1.4,0.2,Iris-setosa                                                                            
   4.7,3.2,1.3,0.2,Iris-setosa                                                                            
   4.6,3.1,1.5,0.2,Iris-setosa                                                                            

val stringDS = spark.createDataset(irisData.split("\n"))(Encoders.STRING)                           
 val irisDatasetDF = spark.read                                                                           
   .option("inferSchema", "true")                                                                         
   .option("header", "true")                                                                              
   .csv(stringDS)                                                                                         
 irisDatasetDF.show(false)                                                                                

   +------------+-----------+------------+-----------+-----------+                                        
   |sepal_length|sepal_width|petal_length|petal_width|label      |                                        
   +------------+-----------+------------+-----------+-----------+                                        
   |5.1         |3.5        |1.4         |0.2        |Iris-setosa|                                        
   |4.9         |3.0        |1.4         |0.2        |Iris-setosa|                                        
   |4.7         |3.2        |1.3         |0.2        |Iris-setosa|                                        
   |4.6         |3.1        |1.5         |0.2        |Iris-setosa|                                        
   +------------+-----------+------------+-----------+-----------+                                        

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM