简体   繁体   English

Flink sql 没有数据写入es集群?

[英]Flink sql writes no data to es cluster?

Problem description: I want to query the data in my mysql or hive through sql and write it to my es cluster.问题描述:我想通过sql查询我的mysql或hive中的数据并写入我的es集群。 The program can run successfully but es has no data程序可以运行成功但是es没有数据

  1. Software version:软件版本:
  • flink: 1.11闪现:1.11
  • es: 6.2.2 es: 6.2.2
  • hive: 1.2.1 hive:1.2.1
  • mysql: 5.7 mysql:5.7
  1. Below is my code下面是我的代码

public class HiveExample {
    public static void main(String[] args) throws DatabaseNotExistException {
        EnvironmentSettings settings = EnvironmentSettings.newInstance()
                .useBlinkPlanner()
                .inBatchMode()
                .build();
        TableEnvironment tabEnv = TableEnvironment.create(settings);


        String sql =
                "insert into user_action_es_sink " +
                        "select 100123,5,11,1,'a','b','111','bbb',cast(11111 as bigint),cast('2020-11-11' as date) from dragonfly.web_page limit 10" ;


        String sporeUserAuthCreateTableSQL = "CREATE TABLE users (\n" +
                "  `id` INT,\n" +
                "  `userid` INT,\n" +
                "  `type` INT,\n" +
                "   PRIMARY KEY (id) NOT ENFORCED" +
                ") WITH (\n" +
                "  'connector' = 'jdbc',\n" +
                "  'url' = 'jdbc:mysql://localhost:3306/spore',\n" +
                "  'table-name' = 'spore_user_auth',\n" +
                "  'driver' = 'com.mysql.jdbc.Driver',\n" +
                "  'username' = 'xxxx',\n" +
                "  'password'  = 'xxxx'\n" +
                ")";

        tabEnv.executeSql(sporeUserAuthCreateTableSQL);

        String esTable = "CREATE TABLE user_action_es_sink (\n" +
                "  uid INT,\n" +
                "  appid INT,\n" +
                "  prepage_id INT,\n" +
                "  page_id INT,\n" +
                "  action_id STRING,\n" +
                "  page_name STRING,\n" +
                "  action_name STRING,\n" +
                "  prepage_name STRING,\n" +
                "  stat_time BIGINT,\n" +
                "  dt DATE\n" +
//                "  PRIMARY KEY (uid,dt) NOT ENFORCED\n" +
                ") WITH (\n" +
                "  'connector' = 'elasticsearch-6',\n" +
                "  'hosts' = 'http://localhost:9200',\n" +
                "  'index' = 'mytest',\n" +
                "  'document-type' = 'user_action'\n" +
//                "  'sink.bulk-flush.max-size' = '0',\n" +
//                "  'sink.bulk-flush.max-actions' = '0',\n" +
//                "  'sink.bulk-flush.interval' = '0'\n"+
//                "  'format' = 'json',\n" +
//                "  'json.fail-on-missing-field' = 'false',\n"+
//                "  'json.ignore-parse-errors' = 'true'\n" +
                ")";

        tabEnv.executeSql(esTable);

        tabEnv.executeSql("insert into user_action_es_sink select 100123,5,11,1,'a','b','111','bbb',cast(11111 as bigint),cast('2020-11-11' as date) from users limit 10").print();


    }
}

My pom file:我的pom文件:



<dependencies>
        <dependency>
            <groupId>org.apache.flink</groupId>
            <artifactId>flink-java</artifactId>
            <version>${flink.version}</version>
        </dependency>
        <dependency>
            <groupId>org.apache.flink</groupId>
            <artifactId>flink-streaming-java_${scala.binary.version}</artifactId>
            <version>${flink.version}</version>
            <!--<scope>provided</scope>-->
        </dependency>
        <dependency>
            <groupId>org.apache.flink</groupId>
            <artifactId>flink-clients_${scala.binary.version}</artifactId>
            <version>${flink.version}</version>
            <!--<scope>provided</scope>-->
        </dependency>

        <dependency>
            <groupId>org.apache.flink</groupId>
            <artifactId>flink-table-api-java-bridge_${scala.binary.version}</artifactId>
            <version>${flink.version}</version>
            <!--<scope>provided</scope>-->
        </dependency>

        <dependency>
            <groupId>org.apache.flink</groupId>
            <artifactId>flink-table-planner_${scala.binary.version}</artifactId>
            <version>${flink.version}</version>
        </dependency>
        <dependency>
            <groupId>org.apache.flink</groupId>
            <artifactId>flink-table-planner-blink_${scala.binary.version}</artifactId>
            <version>${flink.version}</version>
            <!--<scope>provided</scope>-->
        </dependency>
        <dependency>
            <groupId>org.apache.flink</groupId>
            <artifactId>flink-streaming-scala_${scala.binary.version}</artifactId>
            <version>${flink.version}</version>
            <!--<scope>provided</scope>-->
        </dependency>
        <dependency>
            <groupId>org.apache.flink</groupId>
            <artifactId>flink-connector-hive_${scala.binary.version}</artifactId>
            <version>${flink.version}</version>
            <!--<scope>provided</scope>-->
        </dependency>

        <dependency>
            <groupId>org.apache.flink</groupId>
            <artifactId>flink-table-api-java-bridge_${scala.binary.version}</artifactId>
            <version>${flink.version}</version>
            <!--<scope>provided</scope>-->
        </dependency>

        <dependency>
            <groupId>org.apache.hive</groupId>
            <artifactId>hive-exec</artifactId>
            <version>${hive.version}</version>
            <!--<scope>provided</scope>-->
        </dependency>

        <dependency>
            <groupId>org.apache.hadoop</groupId>
            <artifactId>hadoop-client</artifactId>
            <version>${hadoop.version}</version>
        </dependency>

        <dependency>
            <groupId>org.apache.flink</groupId>
            <artifactId>flink-connector-jdbc_${scala.binary.version}</artifactId>
            <version>${flink.version}</version>
        </dependency>
        <dependency>
            <groupId>mysql</groupId>
            <artifactId>mysql-connector-java</artifactId>
            <version>${mysql.version}</version>
        </dependency>
        <dependency>
            <groupId>junit</groupId>
            <artifactId>junit</artifactId>
            <version>${junit.version}</version>
        </dependency>
        <dependency>
            <groupId>org.apache.flink</groupId>
            <artifactId>flink-orc-nohive_2.12</artifactId>
            <version>${flink.version}</version>
        </dependency>
        <dependency>
            <groupId>org.apache.flink</groupId>
            <artifactId>flink-connector-elasticsearch6_${scala.binary.version}</artifactId>
            <version>1.6.0</version>
        </dependency>

        <dependency>
            <groupId>org.apache.flink</groupId>
            <artifactId>flink-connector-elasticsearch6_2.11</artifactId>
            <version>1.11.0</version>
        </dependency>


        <!-- https://mvnrepository.com/artifact/org.apache.flink/flink-json -->
        <dependency>
            <groupId>org.apache.flink</groupId>
            <artifactId>flink-json</artifactId>
            <version>${flink.version}</version>
            <!--<scope>test</scope>-->
        </dependency>

        <!-- https://mvnrepository.com/artifact/org.apache.httpcomponents/httpcore -->
        <dependency>
            <groupId>org.apache.httpcomponents</groupId>
            <artifactId>httpcore</artifactId>
            <version>4.4.13</version>
        </dependency>
    </dependencies>

The code did not prompt me any abnormal information, but the data was not written, and it is not clear what caused the problem.代码没有提示我任何异常信息,但是数据没有写入,不清楚是什么原因导致的问题。

Thank you for your help:)谢谢您的帮助:)

The executeSql function works in async node, if you're testing in your IDE,the main function will exit after called executeSql function and the underlying mini-cluster will shutdown after the main function finished. The executeSql function works in async node, if you're testing in your IDE,the main function will exit after called executeSql function and the underlying mini-cluster will shutdown after the main function finished. This only exists in local test, the production cluster is always alive and the submitted job will execute normally.这仅存在于本地测试中,生产集群始终处于活动状态,提交的作业将正常执行。

If you want to wait the job execution in IDE, you can use following method.如果要等待 IDE 中的作业执行,可以使用以下方法。

tabEnv.executeSql("insert into user_action_es_sink select xxx ")
      .getJobClient().get()          
      .getJobExecutionResult(Thread.currentThread().getContextClassLoader()).get();

And in Flink 1.12, there's a simple way to do this:在 Flink 1.12 中,有一种简单的方法可以做到这一点:

tabEnv.executeSql("insert into user_action_es_sink select xxx ")
      .await();

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM