简体   繁体   中英

how to check performance of apache spark Job

I have installed Apache Spark 2.3.1 and need to check which script is efficient

Questions:

1.How do I monitor Apache Spark script execution?

2.Which one of these scripts is efficient?

rdd = sc.textFile("Readme.txt")

1:

rdd.flatMap(x => x.split(" ")).countByValue()

2:

words = rdd.flatMap(lambda x: x.split(" "))
result = words.map(lambda x: (x, 1)).reduceByKey(lambda x, y: x + y)

使用spark web ui,它包含有关监视性能所需的信息,这些信息包括-时间,执行程序统计信息,阶段统计信息,任务统计信息,资源统计信息等。

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM