How to access elemens in Row RDD in SCALA

Question

My row RDD looks like this:

Array[org.apache.spark.sql.Row] = Array([1,[example1,WrappedArray([**Standford,Organisation,NNP], [is,O,VP], [good,LOCATION,ADP**])]])

I have got this from converting dataframe to rdd, dataframe schema was :

root
 |-- article_id: long (nullable = true)
 |-- sentence: struct (nullable = true)
 |    |-- sentence: string (nullable = true)
 |    |-- attributes: array (nullable = true)
 |    |    |-- element: struct (containsNull = true)
 |    |    |    |-- tokens: string (nullable = true)
 |    |    |    |-- ner: string (nullable = true)
 |    |    |    |-- pos: string (nullable = true)

Now how do access elements in row rdd, in dataframe I can use df.select("sentence"). I am looking forward to access elements like stanford/other nested elements.

Answer 1

As @SarveshKumarSingh wrote in a comment you can access a the rows in a RDD[Row] like you would access any other element in an RDD. Accessing the elements in the row can be done in a couple of ways. Either simply call get like this:

rowRDD.map(row => row.get(2).asInstanceOf[MyType])

or if it is a build in type, you can avoid the type cast:

rowRDD.map(row => row.getList(4))

or you might want to simply use pattern matching, like:

rowRDD.map{case Row(field1: Long, field2: MyType) => field2}

I hope this helps :)

How to access elemens in Row RDD in SCALA

Question

1 answers

solution1
9 ACCPTED 2016-08-18 07:19:02

How to access elemens in Row RDD in SCALA

Question

1 answers

solution1 9 ACCPTED 2016-08-18 07:19:02

solution1
9 ACCPTED 2016-08-18 07:19:02