简体   繁体   中英

Convert spark dataframe to Array[String]

Can any tell me how to convert Spark dataframe into Array[String] in scala.

I have used the following.

x =df.select(columns.head, columns.tail: _*).collect()

The above snippet gives me an Array[Row] and not Array[String]

这应该可以解决问题:

df.select(columns: _*).collect.map(_.toSeq)

DataFrame to Array[String]

data.collect.map(_.toSeq).flatten

You can also use the following

data.collect.map(row=>row.getString(0)) 

If you have more columns then it is good to use the last one

 data.rdd.map(row=>row.getString(0)).collect

If you are planning to read the dataset line by line, then you can use the iterator over the dataset:

 Dataset<Row>csv=session.read().format("csv").option("sep",",").option("inferSchema",true).option("escape, "\"").option("header", true).option("multiline",true).load(users/abc/....);

for(Iterator<Row> iter = csv.toLocalIterator(); iter.hasNext();) {
    String[] item = ((iter.next()).toString().split(",");    
}

The answer was provided by a user named cricket_007. You can use the following to convert Array[Row] to Array[String] :

x =df.select(columns.head, columns.tail: _*).collect().map { row => row.toString() }

Thanks, Bharath

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM