Is there any way to convert Seq[Row] into a dataframe in scala. I have a dataframe and a list of strings that have weights of each row in input dataframe.I want to build a DataFrame that will include all rows with unique weights. I was able to filter unique rows and append to seq[row] but I want to build a dataframe. This is my code.Thanks in advance.
def dataGenerator(input : DataFrame, val : List[String]): Dataset[Row]= {
val valitr = val.iterator
var testdata = Seq[Row]()
var val = HashSet[String]()
if(valitr!=null) {
input.collect().foreach((r) => {
var valnxt = valitr.next()
if (!valset.contains(valnxt)) {
valset += valnxt
testdata = testdata :+ r
}
})
}
//logic to convert testdata as DataFrame and return
}
You said that 'val is calculated using fields from inputdf itself'. If this is the case then you should be able to make a new dataframe with a new column for the 'val' like this:
+------+------+
|item |weight|
+------+------+
|item 1|w1 |
|item 2|w2 |
|item 3|w2 |
|item 4|w3 |
|item 5|w4 |
+------+------+
This is the key thing. Then you will be able to work on the dataframe instead of doing a collect.
What is bad about doing collect? Well there is no point in going to the trouble and overhead of using a distributed big data processing framework just to pull all the data into the memory of 1 machine. See here: Spark dataframe: collect () vs select ()
When you have the input dataframe how you want it, as above, you can get the result. Here is a way that works, which groups the data by the weight column and picks the first item for each grouping.
val result = input
.rdd // get underlying rdd
.groupBy(r => r.get(1)) // group by "weight" field
.map(x => x._2.head.getString(0)) // get the first "item" for each weight
.toDF("item") // back to a dataframe
Then you get the only the first item in case of duplicated weight:
+------+
|item |
+------+
|item 1|
|item 2|
|item 4|
|item 5|
+------+
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.