简体   繁体   English

使用Cloudera CDH 5.8.2的Accumulo扫描/写入未在AWS EC2主服务器中的独立Java主程序中运行

[英]Accumulo scan/write not running in standalone Java main program in AWS EC2 master using Cloudera CDH 5.8.2

We are trying to run simple write/sacn from Accumulo (client jar 1.5.0) in standalone Java main program (Maven shade executable) as below in AWS EC2 master (described below) using Putty 我们正在尝试使用Putty在独立的Java主程序(Maven阴影可执行文件)中从Accumulo(客户端jar 1.5.0)运行简单的write / sacn,如下所示,在AWS EC2 master中(如下所述)

    public class AccumuloQueryApp {

      private static final Logger logger = LoggerFactory.getLogger(AccumuloQueryApp.class);

      public static final String INSTANCE = "accumulo"; // miniInstance
      public static final String ZOOKEEPERS = "ip-x-x-x-100:2181"; //localhost:28076

      private static Connector conn;

      static {
        // Accumulo
        Instance instance = new ZooKeeperInstance(INSTANCE, ZOOKEEPERS);
        try {
          conn = instance.getConnector("root", new PasswordToken("xxx"));
        } catch (Exception e) {
          logger.error("Connection", e);
        }
      }

      public static void main(String[] args) throws TableNotFoundException, AccumuloException, AccumuloSecurityException, TableExistsException {
        System.out.println("connection with : " + conn.whoami());

        BatchWriter writer = conn.createBatchWriter("test", ofBatchWriter());

        for (int i = 0; i < 10; i++) {
          Mutation m1 = new Mutation(String.valueOf(i));
          m1.put("personal_info", "first_name", String.valueOf(i));
          m1.put("personal_info", "last_name", String.valueOf(i));
          m1.put("personal_info", "phone", "983065281" + i % 2);
          m1.put("personal_info", "email", String.valueOf(i));
          m1.put("personal_info", "date_of_birth", String.valueOf(i));
          m1.put("department_info", "id", String.valueOf(i));
          m1.put("department_info", "short_name", String.valueOf(i));
          m1.put("department_info", "full_name", String.valueOf(i));
          m1.put("organization_info", "id", String.valueOf(i));
          m1.put("organization_info", "short_name", String.valueOf(i));
          m1.put("organization_info", "full_name", String.valueOf(i));

          writer.addMutation(m1);
        }
        writer.close();

        System.out.println("Writing complete ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~`");

        Scanner scanner = conn.createScanner("test", new Authorizations());
        System.out.println("Step 1 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~`");
        scanner.setRange(new Range("3", "7"));
        System.out.println("Step 2 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~`");
        scanner.forEach(e -> System.out.println("Key: " + e.getKey() + ", Value: " + e.getValue()));
        System.out.println("Step 3 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~`");
        scanner.close();
      }

      public static BatchWriterConfig ofBatchWriter() {
        //Batch Writer Properties
        final int MAX_LATENCY  = 1;
        final int MAX_MEMORY = 10000000;
        final int MAX_WRITE_THREADS = 10;
        final int TIMEOUT = 10;

        BatchWriterConfig config = new BatchWriterConfig();   
        config.setMaxLatency(MAX_LATENCY, TimeUnit.MINUTES);
        config.setMaxMemory(MAX_MEMORY);
        config.setMaxWriteThreads(MAX_WRITE_THREADS);
        config.setTimeout(TIMEOUT, TimeUnit.MINUTES);

        return config;
      }
    }

Connection is established correctly but creating BatchWriter it getting error and it's trying in loop with same error 连接已正确建立,但是创建BatchWriter时出现错误,并且尝试以相同错误循环

[impl.ThriftScanner] DEBUG: Error getting transport to ip-x-x-x-100:10011 : NotServingTabletException(extent:TKeyExtent(table:21 30, endRow:21 30 3C, prevEndRow:null))

When we run the same code (writing to Accumulo and reading from Accumulo) inside Spark job and submit to the YANK cluster it's running perfectly. 当我们在Spark作业中运行相同的代码(写入Accumulo并从Accumulo读取)并将其提交给YANK集群时,它运行良好。 We are struggling to figure out that but getting no clue. 我们正在努力弄清这一点,但没有任何线索。 Please see the environment as described below 请按照以下说明查看环境

Cloudera CDH 5.8.2 on AWS environemnts (4 EC2 instances as one master and 3 child). AWS环境上的Cloudera CDH 5.8.2(4个EC2实例作为一个主实例和3个子实例)。

Consider the private IPs are like 考虑私有IP就像

  1. Mater: xxx100 母校:xxx100
  2. Child1: xxx101 子1:xxx101
  3. Child2: xxx102 小孩2:xxx102
  4. Child3: xxx103 儿童3:xxx103

We havethe follwing installation in CDH 我们在CDH中有以下安装

Cluster (CDH 5.8.2) 群集(CDH 5.8.2)

  1. Accumulo 1.6 (Tracer not installed, Garbage Collector in Child2, Master in Master, Monitor in child3, Tablet Server in Master) Accumulo 1.6(未安装Tracer,Child2中为垃圾收集器,Master中为Master,Child3中为Monitor,Master中为Tablet Server)
  2. HBase HBase的
  3. HDFS (master as name node, all 3 child as datanode) HDFS(主节点作为名称节点,所有3个子节点作为数据节点)
  4. Kafka 卡夫卡
  5. Spark 火花
  6. YARN (MR2 Included) 纱(包括MR2)
  7. ZooKeeper 动物园管理员

Hrm, that's very curious that it runs with the Spark-on-YARN, but as a regular Java application. 嗯,这很好奇,它可以与Spark-on-YARN一起运行,但可以作为常规Java应用程序运行。 Usually, it's the other way around :) 通常,这是另一种方式:)

I would verify that the JARs on the classpath of the standalone java app match the JARs used by the Spark-on-YARN job as well as the Accumulo server classpath. 我将验证独立Java应用程序的类路径上的JAR是否与Spark-on-YARN作业以及Accumulo服务器类路径使用的JAR相匹配。

If that doesn't help, try to increase the log4j level to DEBUG or TRACE and see if anything jumps out at you. 如果那没有帮助,请尝试将log4j级别提高到DEBUG或TRACE,然后查看是否有任何意外出现。 If you have a hard time understanding what the logging is saying, feel free to send an email to user@accumulo.apache.org and you'll definitely have more eyes on the problem. 如果您很难理解日志记录的含义,请随时发送电子邮件至user@accumulo.apache.org,您一定会更加关注此问题。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM