[英]throughput fall from 4k to 9 messages with tuned-adm changes
I have a network client and server application. 我有一个网络客户端和服务器应用程序。 The dataflow is such that the client sends a message to the server and the server responds with an acknowledgment.
数据流使得客户端向服务器发送消息,并且服务器以确认响应。 Only on the receipt of the acknowledgment, client seconds the next message.
仅在收到确认后,客户端才会秒发下一条消息。
The client application, written in C++, have 3 threads, namely network thread (responsible for sending messages via socket), main thread( responsible for making a request message) and a timer thread (fires every second). 用C ++编写的客户端应用程序具有3个线程,即网络线程(负责通过套接字发送消息),主线程(负责发出请求消息)和计时器线程(每秒触发)。
The server application have 2 threads, main thread and the network thread. 服务器应用程序具有2个线程,主线程和网络线程。
I run RHEL 6.3, 2.6.32-279 kernel. 我运行RHEL 6.3、2.6.32-279内核。
Configuration 1 配置1
Throughput: 4500 messages per second 吞吐量:每秒4500条消息
Configuration 2 配置2
Throughput: 9-15 messages per second 吞吐量:每秒9-15条消息
Configuration 3 配置3
Throughput: 1100 messages per second 吞吐量:每秒1100条消息
The machine has negligible load. 机器的负载可以忽略不计。 Can someone explain the drop from 4k to 9 messages per second when profile was switched from latency-performance to throughput-performance.
当配置文件从延迟性能更改为吞吐量性能时,有人可以解释一下每秒4k到9条消息的下降。
Here's the basic schedule of differences between the RHEL tuned-adm profiles: 这是RHEL tuned-adm配置文件之间差异的基本时间表:
Latency performance shifts the I/O elevator to deadline and changes the CPU governor to the "performance" setting. 延迟性能将I / O升降机移至截止日期,并将CPU调速器更改为“性能”设置。
Throughput performance is optimized for network and disk performance. 吞吐量性能针对网络和磁盘性能进行了优化。 See the specifics below...
请参阅下面的详细信息...
Your workload appears to be latency sensitive. 您的工作负载似乎对延迟敏感。
Here's the setup for throughput-performance
w/ comments. 这是带有注释的
throughput-performance
的设置。 latency-performance
does not modify any of these. latency-performance
不会修改任何这些。
# ktune sysctl settings for rhel6 servers, maximizing i/o throughput
#
# Minimal preemption granularity for CPU-bound tasks:
# (default: 1 msec# (1 + ilog(ncpus)), units: nanoseconds)
kernel.sched_min_granularity_ns = 10000000
# SCHED_OTHER wake-up granularity.
# (default: 1 msec# (1 + ilog(ncpus)), units: nanoseconds)
#
# This option delays the preemption effects of decoupled workloads
# and reduces their over-scheduling. Synchronous workloads will still
# have immediate wakeup/sleep latencies.
kernel.sched_wakeup_granularity_ns = 15000000
# If a workload mostly uses anonymous memory and it hits this limit, the entire
# working set is buffered for I/O, and any more write buffering would require
# swapping, so it's time to throttle writes until I/O can catch up. Workloads
# that mostly use file mappings may be able to use even higher values.
#
vm.dirty_ratio = 40
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.