简体繁体中英

How can we use streaming in spark from multiple source? e.g First take data from HDFS and then consume streaming from Kafka

原文 2022-08-27 08:41:22 6 1 mysql/ apache-spark/ hadoop/ apache-kafka/ spark-streaming

The problem arise when I already have a system and I want to implement a Spark Streaming on top. I have 50 million rows transactional data on MySQL, I want to do reporting on those data. I thought to dump the data into HDFS. Now, Data are coming everyday also in DB and I am adding KAFKA for new data.

I want to know how can I combine multiple source data and do analytics in real-time (1-2 minutes delay is ok) and save those results because future data needs previous results.

1 answers

Joins are possible in SparkSQL, but what happens when you need to update data in mysql? Then your HDFS data becomes invalid very quickly (faster than a few minutes, for sure). Tip: Spark can use JDBC rather than need HDFS exports.

Without knowing more about your systems, I say keep the mysql database running, as there is probably something else actively using it. If you want to use Kafka, then that's a continous feed of data, but HDFS/MySQL are not. Combining remote batch lookups with streams will be slow (could be more than few minutes).

However, if you use Debezium to get data into Kafka from mysql , then you then have data centralized in one location, and then ingest from Kafka into an indexable location such as Druid, Apache Pinot, Clickhouse, or maybe ksqlDB to ingest.

Query from those, as they are purpose built for that use case, and you don't need Spark. Pick one or more as they each support different use cases / query patterns.

Spark structured streaming from kafka spark 2.4.5 not saving into mysql db

Stream reading from database using spark streaming

How can I consume message from kafka in order?

Reading from HDFS into Spark

Laravel make:auth — How do I send 2 queries from 1 form? e.g: email to one table & username to another table

we need to retrieve all the rows from mysql. And table has only a single column e.g name.Which approach is better?

How to INSERT Streaming JSON data from Console to a MySQL database

postgresql, convert unsigned number (from mysql) to signed (e.g 65535 to -32768 for smallint)

MySQL query to show a persons last 8 sporting results (e.g W, L, or D) from a table

Consume a big data by Kafka and Spark

暂无

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

Related Question Spark structured streaming from kafka spark 2.4.5 not saving into mysql db Stream reading from database using spark streaming How can I consume message from kafka in order? Reading from HDFS into Spark Laravel make:auth — How do I send 2 queries from 1 form? e.g: email to one table & username to another table we need to retrieve all the rows from mysql. And table has only a single column e.g name.Which approach is better? How to INSERT Streaming JSON data from Console to a MySQL database postgresql, convert unsigned number (from mysql) to signed (e.g 65535 to -32768 for smallint) MySQL query to show a persons last 8 sporting results (e.g W, L, or D) from a table Consume a big data by Kafka and Spark

Related Tags

How can we use streaming in spark from multiple source? e.g First take data from HDFS and then consume streaming from Kafka

Question

1 answers

solution1 0 2022-08-27 11:51:17

solution1
0 2022-08-27 11:51:17