简体   繁体   中英

Spark Performance Monitoring

I have got a requirement to show the management/ Client that the executor-memory, number of cores, default parallelism, number of shuffle partitions and other configuration properties for running the spark job are not excessive or more than required. I need a monitoring (with visualization) tool by which I can justify the memory usage in the spark job. Additionally it should give the kind of information like memory is not getting used properly or certain job requires more memory.

Please suggest some application or tool.

LinkedIn has created a tool that sounds very similar to what you're looking for

See for a presentation as an overview of that product https://youtu.be/7KjnjwgZN7A?t=480

LinkedIn team has open-sourced Dr. Elephant here - https://github.com/linkedin/dr-elephant

Give it a try. Notice that this setup may require manual tweaking of Spark History Server as part of initial integration setup to get the information that Dr. Elephant requires.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM