简体   繁体   中英

Spark Structured Streaming performance for Scala vs Python

Hi~ I'm going to develop a mini-batched program with Kafka + Spark Structured Streaming . But I am very confused, whether to use python or scala, which is faster. It would be better if there is any benchmark performance result about Spark Structured Streaming between Scala and Python.

Not really an issue.

Only thing are that 1) Scala is faster but the scale of data per microbatch may mean less of an impact and 2) Scala has dataset support for types, pyspark does not.

Most use Scala, pyspark more for data science.

That said real-time machine learning may well be better with pyspark. See for example: https://towardsdatascience.com/building-a-real-time-prediction-pipeline-using-spark-structured-streaming-and-microservices-626dc20899eb

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM