簡體   English   中英

如何將RDD的數據提取到Java ArrayList?

[英]How to extract RDD's data to Java ArrayList?

顯而易見的想法是添加元素。

ArrayList<String> myvalues = new ArrayList<String>();

myRdd.foreach(new VoidFunction<org.apache.spark.sql.api.java.Row>() {
    @Override
    public void call(org.apache.spark.sql.api.java.Row row) throws Exception {
        myvalues.add(row.getString(0); // Say I need only first element
    }
});

這和其他替代方法都引發org.apache.spark.SparkException:任務不可序列化 我進一步簡化了功能..顯​​然我在做一些不合邏輯的事情:

LOG.info("Let's see..");
queryRdd.foreach(new VoidFunction<org.apache.spark.sql.api.java.Row>() {
  @Override
  public void call(org.apache.spark.sql.api.java.Row row) throws Exception {
      LOG.info("Value is : "+row.getString(0));
  }
});

必須有一個簡單的方法。 這是供參考的堆棧跟蹤:

2015-10-08 10:16:48 INFO  UpdateStatementTemplateImpl:141 - Lets see.. 
2015-10-08 10:16:48 WARN  GenericExceptionMapper:20 - Error while executing service
org.apache.spark.SparkException: Task not serializable
        at org.apache.spark.util.ClosureCleaner$.ensureSerializable(ClosureCleaner.scala:166)
        at org.apache.spark.util.ClosureCleaner$.clean(ClosureCleaner.scala:158)
        at org.apache.spark.SparkContext.clean(SparkContext.scala:1476)
        at org.apache.spark.rdd.RDD.foreach(RDD.scala:781)
        at org.apache.spark.api.java.JavaRDDLike$class.foreach(JavaRDDLike.scala:313)
        at org.apache.spark.sql.api.java.JavaSchemaRDD.foreach(JavaSchemaRDD.scala:42)
        at com.simility.cassandra.template.DeviceIDTemplateImpl.test(DeviceIDTemplateImpl.java:144)
        at com.kumbay.service.admin.BusinessEntityService.testSignal(BusinessEntityService.java:1801)
        at com.kumbay.service.admin.BusinessEntityService$$FastClassByCGLIB$$157ddd50.invoke(<generated>)
        at org.springframework.cglib.proxy.MethodProxy.invoke(MethodProxy.java:204)
        at org.springframework.aop.framework.CglibAopProxy$CglibMethodInvocation.invokeJoinpoint(CglibAopProxy.java:701)
        at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:150)
        at org.springframework.transaction.interceptor.TransactionInterceptor$1.proceedWithInvocation(TransactionInterceptor.java:96)
        at org.springframework.transaction.interceptor.TransactionAspectSupport.invokeWithinTransaction(TransactionAspectSupport.java:260)
        at org.springframework.transaction.interceptor.TransactionInterceptor.invoke(TransactionInterceptor.java:94)
        at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:172)
        at org.springframework.security.access.intercept.aopalliance.MethodSecurityInterceptor.invoke(MethodSecurityInterceptor.java:64)
        at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:172)
        at org.springframework.aop.framework.CglibAopProxy$DynamicAdvisedInterceptor.intercept(CglibAopProxy.java:634)

我假設LOGmyvalues生活在一個包含類中。 因此,整個類(作為call的“捕獲”的一部分)將被序列化,這是不可能的。

首先,用一個簡單的System.out.println替代LOG,看看是否System.out.println

其次,創建您在通話中使用的成員的副本;

public void call(...) {
    Log log = LOG // or
    ArrayList<String> inside = myvalues
    inside.add(...)
}

第三, 永遠不要foreach 使用 ArrayList,因為它在不同的節點上運行,並且每個節點都會看到自己的ArrayList。 因此,您將永遠不會達到您的期望。

而是使用rdd.collect(...)來收集結果!

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM