Apache Storm - 从 SPOUT 访问数据库 - 连接池

Question

Having a spout which on each tick goes to Postgre database and reads an additional row.有一个喷口，每次滴答都会转到 Postgre 数据库并读取额外的一行。 The spout code looks as follows:喷口代码如下所示：

class RawDataLevelSpout extends BaseRichSpout implements Serializable {


private int counter;

SpoutOutputCollector collector;


@Override
public void declareOutputFields(OutputFieldsDeclarer declarer) {
    declarer.declare(new Fields("col1", "col2"));
}

@Override
public void open(Map map, TopologyContext context, SpoutOutputCollector spoutOutputCollector) {
    collector = spoutOutputCollector;
}

private Connection initializeDatabaseConnection() {

    try {
        Class.forName("org.postgresql.Driver");
        Connection connection = null;
        connection = DriverManager.getConnection(
                DATABASE_URI,"root", "root");
        return connection;
    } catch (ClassNotFoundException e) {
        e.printStackTrace();
    } catch (SQLException e) {
        e.printStackTrace();
    }
    return null;
}

@Override
public void close() {

}

@Override
public void nextTuple() {
    List<String> values = new ArrayList<>();

    PreparedStatement statement = null;
    try {
        Connection connection = initializeDatabaseConnection();
        statement = connection.prepareStatement("SELECT * FROM table1 ORDER BY col1 LIMIT 1 OFFSET ?");
        statement.setInt(1, counter++);
        ResultSet resultSet = statement.executeQuery();
        resultSet.next();
        ResultSetMetaData resultSetMetaData = resultSet.getMetaData();
        int totalColumns = resultSetMetaData.getColumnCount();
        for (int i = 1; i <= totalColumns; i++) {
            String value = resultSet.getString(i);
            values.add(value);
        }


        connection.close();
    } catch (SQLException e) {
        e.printStackTrace();
    }
    collector.emit(new Values(values.stream().toArray(String[]::new)));
}

} }

What is the standard way how to approach connection pooling in Spouts in apache storm?如何在 Apache Storm 中处理 Spouts 中的连接池的标准方法是什么？ Furthermore, is it possible to somehow synchronize the coutner variable accross multiple running instances within the cluster topology?此外，是否有可能以某种方式同步集群拓扑中多个运行实例的 coutner 变量？

Answer 1

Regarding connection pooling, you could pool connections via static variable if you wanted, but since you aren't guaranteed to have all spout instances running in the same JVM, I don't think there's any point.关于连接池，如果需要，您可以通过静态变量来池连接，但由于不能保证所有 spout 实例都在同一个 JVM 中运行，因此我认为没有任何意义。

No, there is no way to synchronize the counter.不，没有办法同步计数器。 The spout instances may be running on different JVMs, and you don't want them all blocking while the spouts agree what the counter value is. spout 实例可能运行在不同的 JVM 上，并且您不希望它们在 spout 同意计数器值时全部阻塞。 I don't think your spout implementation makes sense though.不过，我认为您的 spout 实现没有意义。 If you wanted to just read one row at a time, why would you not just run a single spout instance instead of trying to synchronize multiple spouts?如果您只想一次读取一行，为什么不只运行单个 spout 实例，而不是尝试同步多个 spout？

You seem to be trying to use your relational database as a queue system, which is probably a bad fit.您似乎试图将关系数据库用作队列系统，这可能不太合适。 Consider eg Kafka instead.考虑使用例如 Kafka。 I think you should be able to use either one of https://www.confluent.io/product/connectors/ or http://debezium.io/ to stream data from your Postgres to Kafka.我认为您应该能够使用https://www.confluent.io/product/connectors/或http://debezium.io/ 之一将数据从 Postgres 流式传输到 Kafka。

Apache Storm - 从 SPOUT 访问数据库 - 连接池

问题描述

1 个解决方案

解决方案1
1 已采纳 2018-03-06 20:12:28

Apache Storm - 从 SPOUT 访问数据库 - 连接池

问题描述

1 个解决方案

解决方案1 1 已采纳 2018-03-06 20:12:28

解决方案1
1 已采纳 2018-03-06 20:12:28