简体   繁体   中英

Real-time pipeline for processing data, insert into PSQL

I have files that will be coming in daily that I would like to process as they come in and insert into existing sql tables (using postgres). What is the best way to create an automated pipeline?

I have already written the file processing scripts on python which return the data in format to be appended to the sql tables. What is the best way to make this pipeline real-time. That is, have the pipeline automatically process the file as its sent to me and then have the data added to the sql table. At the moment i am doing this manually by batch but i want to fully automate the process. The key missing step is having the file automatically processed by the scrip. I am reading Apache kafka can help but I'm still a novice here.

Any help is much appreciated!

Alternatively, you can use the JDBC sink Connector. It will deliver data from kafka to postgresql in real time. It is possible to achieve idempotent writes with upserts. Auto-creation of tables and limited auto-evolution is also supported. good connector description ; config-options

In fact, the above connector will subscribe to a kafka topic and send kafka messages to the database table in the stream. You can also subscribe to multiple topics.

minimum configuration example

{
  "name": "name_connector",
  "connector.class": "io.confluent.connect.jdbc.JdbcSinkConnector",
  "value.converter": "io.confluent.connect.avro.AvroConverter",
  "topics": [
    "ozon-products"
  ],
  "connection.url": "postgresql://log:pass@hostname:5432/db",
  "dialect.name": "PostgreSqlDatabaseDialect",
  "insert.mode": "INSERT",
  "batch.size": "10",
  "table.types": [
    "table"
  ],
  "auto.create": true
}

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM