简体   繁体   English

用于处理数据的实时管道,插入到 PSQL

[英]Real-time pipeline for processing data, insert into PSQL

I have files that will be coming in daily that I would like to process as they come in and insert into existing sql tables (using postgres).我有每天都会进来的文件,我想在它们进来时处理它们并插入现有的 sql 表(使用 postgres)。 What is the best way to create an automated pipeline?创建自动化管道的最佳方法是什么?

I have already written the file processing scripts on python which return the data in format to be appended to the sql tables.我已经在 python 上编写了文件处理脚本,它以格式返回要附加到 sql 表的数据。 What is the best way to make this pipeline real-time.使此管道实时的最佳方法是什么。 That is, have the pipeline automatically process the file as its sent to me and then have the data added to the sql table.也就是说,让管道自动处理发送给我的文件,然后将数据添加到 sql 表中。 At the moment i am doing this manually by batch but i want to fully automate the process.目前我正在批量手动执行此操作,但我想完全自动化该过程。 The key missing step is having the file automatically processed by the scrip.缺少的关键步骤是让脚本自动处理文件。 I am reading Apache kafka can help but I'm still a novice here.我正在阅读 Apache kafka 可以提供帮助,但我在这里仍然是新手。

Any help is much appreciated!任何帮助深表感谢!

Alternatively, you can use the JDBC sink Connector.或者,您可以使用 JDBC 接收器连接器。 It will deliver data from kafka to postgresql in real time.它将数据从kafka实时传送到postgresql。 It is possible to achieve idempotent writes with upserts.可以通过 upsert 实现幂等写入。 Auto-creation of tables and limited auto-evolution is also supported.还支持自动创建表和有限的自动进化。 good connector description ; 良好的连接器描述 config-options 配置选项

In fact, the above connector will subscribe to a kafka topic and send kafka messages to the database table in the stream.实际上,上面的连接器会订阅一个kafka主题,并将kafka消息发送到stream中的数据库表。 You can also subscribe to multiple topics.您还可以订阅多个主题。

minimum configuration example最低配置示例

{
  "name": "name_connector",
  "connector.class": "io.confluent.connect.jdbc.JdbcSinkConnector",
  "value.converter": "io.confluent.connect.avro.AvroConverter",
  "topics": [
    "ozon-products"
  ],
  "connection.url": "postgresql://log:pass@hostname:5432/db",
  "dialect.name": "PostgreSqlDatabaseDialect",
  "insert.mode": "INSERT",
  "batch.size": "10",
  "table.types": [
    "table"
  ],
  "auto.create": true
}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM