[英]Apache Spark isEmpty false but collection is empty
I am experiencing problems with Apache Spark when I call isEmpty
on JavaRDD collection it returns false even though collection is empty. 我在JavaRDD集合上调用
isEmpty
时遇到Apache Spark问题,即使集合为空,它也会返回false。
Here's sample code (modified as it's from my final year project and I'm not allowed to publish any code): 这是示例代码(由于是我在去年的项目中修改的,因此我不允许发布任何代码):
sampleRdd = inputRdd.filter(someFilterFunction)
if(sampleRdd.isEmpty()) {
return inputRdd.first();
} else {
return sampleRdd.first(); // JVM points error on this line
}
Problem is sometimes condition is false so sampleRdd.isEmpty()
returns false meaning it's not empty therefore execution proceeds to return statement where it's trying to retrieve first()
element of that collection but it throws exception: 问题是有时条件为假,所以
sampleRdd.isEmpty()
返回false表示它不为空,因此执行继续返回return语句,在该语句中它尝试检索该集合的first()
元素,但会引发异常:
Exception in thread "main" java.lang.UnsupportedOperationException: empty collection
at org.apache.spark.rdd.RDD$$anonfun$first$1.apply(RDD.scala:1314)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:147)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:108)
at org.apache.spark.rdd.RDD.withScope(RDD.scala:306)
at org.apache.spark.rdd.RDD.first(RDD.scala:1311)
at org.apache.spark.api.java.JavaRDDLike$class.first(JavaRDDLike.scala:510)
at org.apache.spark.api.java.AbstractJavaRDDLike.first(JavaRDDLike.scala:47)
.
.
.
Is there a bit I am missing? 我有点想念吗? I'm currently running it on local machine as it's still not fully developed.
我目前正在本地计算机上运行它,因为它尚未完全开发。
Thanks 谢谢
EDIT: To add more info JVM points to line sampleRdd.first()
when I get this error so initial inputRdd is not empty 编辑:要添加更多信息,JVM会在出现此错误时指向
sampleRdd.first()
行,因此初始inputRdd不为空
EDIT2: I wrote some extra lines that print size of inputRDD
before filter and sampleRDD
after filter like that: EDIT2:我写了打印的大小一些额外的线路
inputRDD
过滤器和前sampleRDD
这样的过滤器后:
System.out.println(inputRdd.count()); // Returns nonzero possitive int eg.100
sampleRdd = inputRdd.filter(someFilterFunction)
System.out.println(sampleRdd.count()); // Returns int eg. 1
System.out.println(sampleRdd.count()); // Sometimes returns different int than first call eg.3
if(sampleRdd.isEmpty()) {
return inputRdd.first();
} else {
return sampleRdd.first(); // JVM points error on this line
}
And I observed very interesting behaviour and that is that sometimes inputRdd.count()
returns 100
but first sampleRdd.count()
returns 1
and second sampleRdd.count()
returns 3
or basically different number from the first call. 我观察到非常有趣的行为,即有时
inputRdd.count()
返回100
但第一个sampleRdd.count()
返回1
,第二个sampleRdd.count()
返回3
或与第一次调用基本不同的数字。 So basically looks like size of sampleRdd
changes between two calls and therefore I assume sometimes it might change to after passing condition and trying to call first()
returns error. 因此,基本上看起来像
sampleRdd
大小在两次调用之间发生了变化,因此我认为有时在传递条件并尝试调用first()
返回错误后,它可能会更改为。
Any idea what might be causing that? 知道是什么原因造成的吗?
What if inputRdd
is originally empty? 如果
inputRdd
是空的怎么办?
In that case, sampleRdd
is also empty. 在这种情况下,
sampleRdd
也为空。 Therefore samplerdd.isEmpty
evaluates to true
and inputRdd.first()
throws the Exception. 因此,
samplerdd.isEmpty
计算结果为true
, inputRdd.first()
引发异常。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.