[英]Extract information from a `org.apache.spark.sql.Row`
I have Array[org.apache.spark.sql.Row]
returned by sqc.sql(sqlcmd).collect()
: 我有
sqc.sql(sqlcmd).collect()
返回的Array[org.apache.spark.sql.Row]
:
Array([10479,6,10], [8975,149,640], ...)
I can get the individual values: 我可以得到个人价值观:
scala> pixels(0)(0)
res34: Any = 10479
but they are Any
, not Int
. 但它们是
Any
,而不是Int
。
How do I extract them as Int
? 如何将它们作为
Int
提取?
The most obvious solution did not work: 最明显的解决方案不起作用:
scala> pixels(0).getInt(0)
java.lang.ClassCastException: java.lang.String cannot be cast to java.lang.Int
PS. PS。 I can do
pixels(0)(0).toString.toInt
or pixels(0).getString(0).toInt
, but they feel wrong... 我可以做
pixels(0)(0).toString.toInt
或pixels(0).getString(0).toInt
,但他们感觉不对...
Using getInt
should work. 使用
getInt
应该可行。 Here is a contrived example as a proof of concept 这是一个人为的例子作为概念证明
import org.apache.spark.sql._
sc.parallelize(Array(1,2,3)).map(Row(_)).collect()(0).getInt(0)
This return 1 这回报1
However, 然而,
sc.parallelize(Array("1","2","3")).map(Row(_)).collect()(0).getInt(0)
fails. 失败。 So, it looks like it is coming in as a string and you will have to convert to an int manually.
所以,它看起来像是一个字符串,你必须手动转换为int。
sc.parallelize(Array("1","2","3")).map(Row(_)).collect()(0).getString(0).toInt
The documentation states that getInt
: 文档说明了
getInt
:
Returns the value of column i as an int.
将列i的值作为int返回。 This function will throw an exception if the value is at i is not an integer, or if it is null.
如果值不是整数,或者它是null,则此函数将抛出异常。
So, it will not try to cast for you it seems 所以,它似乎不会试图为你施展
The Row
class (also see https://spark.apache.org/docs/1.1.0/api/scala/index.html#org.apache.spark.sql.package ) has methods getInt(i: Int)
, getDouble(i: Int)
etc. Row
类 (另见https://spark.apache.org/docs/1.1.0/api/scala/index.html#org.apache.spark.sql.package )有方法getInt(i: Int)
, getDouble(i: Int)
等
Also note that a SchemaRDD
is an RDD[Row]
plus a schema
that tells you which column has which data type. 另请注意,
SchemaRDD
是一个RDD[Row]
加上一个schema
,告诉您哪个列具有哪种数据类型。 If you do .collect()
you will only get an Array[Row]
which does not have that information. 如果你执行
.collect()
你将只得到一个没有该信息的Array[Row]
。 So unless you know for sure what your data looks like, get the schema from the SchemaRDD
, then collect the rows and then access each field using the correct type information. 因此,除非您确切知道数据是什么样的,否则从
SchemaRDD
获取模式,然后收集行,然后使用正确的类型信息访问每个字段。
the answer is relevant. 答案是相关的。 you dont need to use collect instead you need to call the methods
getInt
getString
and getAs
as well in case the datatype is complex 您不需要使用collect而是需要调用方法
getInt
getString
和getAs
以防数据类型复杂
val popularHashTags = sqlContext.sql("SELECT hashtags, usersMentioned, Url FROM tweets")
var hashTagsList = popularHashTags.flatMap ( x => x.getAs[Seq[String]](0))
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.