简体繁体中英

Is MySQL more efficient in query optimization and general efficiency than Apache spark

原文 2016-06-19 12:52:11 9 1 apache-spark/ apache-spark-sql

I find that Apache spark is much slower then a MySQL server for the same query and the same table query on a spark data frame.

So where would be spark more efficient then MySQL?

Note : tried on a table with 1 million rows all of 10 columns of type text.

The size of table in json is about 10GB

Using a standalone pyspark notebook with Xeon 16 core and 64gb RAM and on same server MySql

In general I would like to know guidelines on when to use SPARK vs SQL server in terms of the size of target data to get real snappy results from analytic queries.

1 answers

Ok, so going to try and help here even though it's still very difficult to answer this without knowing more. Assuming there is no contention for resources, there are a number of things that are going on here. If you're running on yarn and your json is stored in hdfs. It is likely split into many blocks, those blocks are then processed in different partitions. Since json doesn't split very well, you'd lose alot of parallel capabilities. Also, spark isn't meant to really have the super low latency queries like a tuned rdbms. Where you benefit from spark is on heavy data processing, large amounts of data (TB or PB). If you are looking for low latency queries you should use Impala or Hive with Tez. You should also consider changing your file format to avro, parquet or ORC.

Apache Spark:Which one is more efficient?

More than expected jobs running in apache spark

Apache Spark optimization

Are flatfiles(orc,csv) more efficient than delta table in spark

more efficient method in Spark than filter.count?

Spark sql query optimization

Optimization query for DataFrame Spark

Apache Spark - shuffle writes more data than the size of the input data

Efficient string matching in Apache Spark

How to write Spark Optimization to recalculate DataFrame when array column contains more values than a threshold?

暂无

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

Related Question Apache Spark:Which one is more efficient? More than expected jobs running in apache spark Apache Spark optimization Are flatfiles(orc,csv) more efficient than delta table in spark more efficient method in Spark than filter.count? Spark sql query optimization Optimization query for DataFrame Spark Apache Spark - shuffle writes more data than the size of the input data Efficient string matching in Apache Spark How to write Spark Optimization to recalculate DataFrame when array column contains more values than a threshold?

Related Tags

Is MySQL more efficient in query optimization and general efficiency than Apache spark

Question

1 answers

solution1 1 2016-06-19 16:57:58

solution1
1 2016-06-19 16:57:58