简体   繁体   English

Apache Spark isEmpty为false,但集合为空

[英]Apache Spark isEmpty false but collection is empty

I am experiencing problems with Apache Spark when I call isEmpty on JavaRDD collection it returns false even though collection is empty. 我在JavaRDD集合上调用isEmpty时遇到Apache Spark问题,即使集合为空,它也会返回false。

Here's sample code (modified as it's from my final year project and I'm not allowed to publish any code): 这是示例代码(由于是我在去年的项目中修改的,因此我不允许发布任何代码):

sampleRdd = inputRdd.filter(someFilterFunction)
if(sampleRdd.isEmpty()) {
       return inputRdd.first();
} else {
        return sampleRdd.first(); // JVM points error on this line
}

Problem is sometimes condition is false so sampleRdd.isEmpty() returns false meaning it's not empty therefore execution proceeds to return statement where it's trying to retrieve first() element of that collection but it throws exception: 问题是有时条件为假,所以sampleRdd.isEmpty()返回false表示它不为空,因此执行继续返回return语句,在该语句中它尝试检索该集合的first()元素,但会引发异常:

Exception in thread "main" java.lang.UnsupportedOperationException: empty collection
at org.apache.spark.rdd.RDD$$anonfun$first$1.apply(RDD.scala:1314)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:147)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:108)
at org.apache.spark.rdd.RDD.withScope(RDD.scala:306)
at org.apache.spark.rdd.RDD.first(RDD.scala:1311)
at org.apache.spark.api.java.JavaRDDLike$class.first(JavaRDDLike.scala:510)
at org.apache.spark.api.java.AbstractJavaRDDLike.first(JavaRDDLike.scala:47)
.
.
.

Is there a bit I am missing? 我有点想念吗? I'm currently running it on local machine as it's still not fully developed. 我目前正在本地计算机上运行它,因为它尚未完全开发。

Thanks 谢谢

EDIT: To add more info JVM points to line sampleRdd.first() when I get this error so initial inputRdd is not empty 编辑:要添加更多信息,JVM会在出现此错误时指向sampleRdd.first()行,因此初始inputRdd不为空

EDIT2: I wrote some extra lines that print size of inputRDD before filter and sampleRDD after filter like that: EDIT2:我写了打印的大小一些额外的线路inputRDD过滤器和前sampleRDD这样的过滤器后:

System.out.println(inputRdd.count());  // Returns nonzero possitive int eg.100
sampleRdd = inputRdd.filter(someFilterFunction)
System.out.println(sampleRdd.count()); // Returns int eg. 1 
System.out.println(sampleRdd.count()); // Sometimes returns different int than first call eg.3
if(sampleRdd.isEmpty()) {
       return inputRdd.first();
} else {
        return sampleRdd.first(); // JVM points error on this line
}

And I observed very interesting behaviour and that is that sometimes inputRdd.count() returns 100 but first sampleRdd.count() returns 1 and second sampleRdd.count() returns 3 or basically different number from the first call. 我观察到非常有趣的行为,即有时inputRdd.count()返回100但第一个sampleRdd.count()返回1 ,第二个sampleRdd.count()返回3或与第一次调用基本不同的数字。 So basically looks like size of sampleRdd changes between two calls and therefore I assume sometimes it might change to after passing condition and trying to call first() returns error. 因此,基本上看起来像sampleRdd大小在两次调用之间发生了变化,因此我认为有时在传递条件并尝试调用first()返回错误后,它可能会更改为。

Any idea what might be causing that? 知道是什么原因造成的吗?

What if inputRdd is originally empty? 如果inputRdd是空的怎么办?

In that case, sampleRdd is also empty. 在这种情况下, sampleRdd也为空。 Therefore samplerdd.isEmpty evaluates to true and inputRdd.first() throws the Exception. 因此, samplerdd.isEmpty计算结果为trueinputRdd.first()引发异常。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 Spark UnsupportedOperationException:空集合 - Spark UnsupportedOperationException: empty collection 集合堆栈 isEmpty 方法 - Collection stacks isEmpty method Apache Spark:驱动程序的垃圾收集日志 - Apache Spark: Garbage Collection Logs for Driver 对性能CollectionUtils.isEmpty()或collection.isEmpty()有什么好处 - What is better for the performance CollectionUtils.isEmpty() or collection.isEmpty() ArrayList 上的 isEmpty() 导致 false 尽管其大小为 0 - isEmpty() on an ArrayList results in false although its size is 0 Android-Studio 错误:条件“emails.isEmpty()”始终为“假”; 条件 'emails.isEmpty() && password.isEmpty()' 始终为 'false' - Android-Studio errors: Condition 'emails.isEmpty()' is always 'false'; Condition 'emails.isEmpty() && password.isEmpty()' is always 'false' Java Stack 类中的 empty() 与 isEmpty() - empty() vs isEmpty() in Java Stack class Apache Spark将RDD的集合转换为单个RDD JAVA - Apache Spark Convert collection of RDD to single RDD JAVA 直接删除后,MediatorLiveData.getValue()。isEmpty()返回false - MediatorLiveData.getValue().isEmpty() returns false after delete directly 使用Mockito isEmpty的模拟列表总是返回false,即使大小为0 - Mocked List using Mockito isEmpty always returns false, even if the size is 0
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM