简体   繁体   English

为什么我不能通过 HAProxy 将超过 8000 个客户端连接到 MQTT 代理?

[英]Why cant I connect more than 8000 clients to MQTT brokers via HAProxy?

I am trying to establish 10k client connections(potentially 100k) with my 2 MQTT brokers using HAProxy as a load balancer.我正在尝试使用 HAProxy 作为负载均衡器与我的 2 个 MQTT 代理建立 10k 客户端连接(可能为 100k)。

I have a working simulator(using Java Paho library) that can simulate 10k clients.我有一个可以模拟 10k 客户端的工作模拟器(使用 Java Paho 库)。 On the same machine I run 2 MQTT brokers in docker.在同一台机器上,我在 docker 中运行 2 个 MQTT 代理。 For LB im using another machine with virtual image of Ubuntu 16.04.对于 LB,我使用另一台机器,其虚拟映像为 Ubuntu 16.04。

When I connect directly to a MQTT Broker those connections are established without a problem, however when I use HAProxy I only get around 8.8k connections, while the rest throw: Error at client{insert number here}: Connection lost (32109) - java.net.SocketException: Connection reset.当我直接连接到 MQTT 代理时,这些连接建立起来没有问题,但是当我使用 HAProxy 时,我只能获得大约 8.8k 个连接,而 rest 抛出:客户端错误{在此处插入编号}:连接丢失(32109)-java .net.SocketException:连接重置。 When I connect simulator directly to a broker (Same machine) about 20k TCP connections open, however when I use load balancer only 17k do.当我将模拟器直接连接到代理(同一台机器)时,大约 20k TCP 连接打开,但是当我使用负载均衡器时,只有 17k 可以。 This leaves me thinking that LB is causing the problem.这让我认为是 LB 造成了问题。

It is important to add that whenever I run the simulator I'm unable to use the browser (Cannot connect to the internet).重要的是要补充一点,每当我运行模拟器时,我都无法使用浏览器(无法连接到互联网)。 I havent tested if this is browser only, but could that mean that I actually run out of ports or something similar and the real issue here is not in the LB?我还没有测试这是否只是浏览器,但这是否意味着我实际上用完了端口或类似的东西,而这里的真正问题不在于 LB?

Here is my HAProxy configuration:这是我的 HAProxy 配置:

global
    log /dev/log local0
    log /dev/log local1 notice
    maxconn 500000
    ulimit-n 500000 
    maxpipes 500000

defaults
    log global
    mode http
    timeout connect 3h  
    timeout client 3h
    timeout server 3h
    
listen mqtt
    bind *:8080
    mode tcp
    option tcplog
    option clitcpka
    balance leastconn
    server broker_1 address:1883 check
    server broker_2 address:1884 check

listen stats 
    bind 0.0.0.0:1936
    mode http
    stats enable
    stats hide-version
    stats realm Haproxy\ Statistics
    stats uri /

This is what MQTT broker shows for every successful/unsuccessful connection这是 MQTT 代理为每个成功/不成功的连接显示的内容

...
//Successful connection
1613382861: New connection from xxx:32850 on port 1883.
1613382861: New client connected from xxx:60974 as 356 (p2, c1, k1200, u'admin').
...
//Unsuccessful connection
1613382699: New connection from xxx:42861 on port 1883.
1613382699: Client <unknown> closed its connection.
...

And this is what ulimit -a shows on LB machine.这就是 ulimit -a 在 LB 机器上显示的内容。

core file size (blocks)         (-c) 0
data seg size (kb)              (-d) unlimited
scheduling priority             (-e) 0
file size (blocks)              (-f) unlimited
pending signals                 (-i) 102355
max locked memory (kb)          (-l) 82000
max memory size (kb)            (-m) unlimited
open files                      (-n) 500000
POSIX message queues (bytes)    (-q) 819200
real-time priority              (-r) 0
stack size (kb)                 (-s) 8192
cpu time (seconds)              (-t) unlimited
max user processes              (-u) 500000
virtual memory (kb)             (-v) unlimited
file locks                      (-x) unlimited

Note: The LB process has the same limits.注意:LB 进程也有同样的限制。

I followed various tutorials and increased open file limit as well as port limit and TCP header size, etc. The number of connected users increased from 2.8k to about 8.5-9k (Which is still way lower than the 300k author of the tutorial had).我按照各种教程增加了打开文件限制以及端口限制和 TCP header 大小等。连接用户的数量从 2.8k 增加到大约 8.5-9k(这仍然低于教程作者的 300k) . ss -s command shows about 17000ish TCP and inet connections. ss -s 命令显示大约 17000ish TCP 和 inet 连接。

Any pointers would greatly help!任何指针都会有很大帮助! Thanks!谢谢!

You can't do a normal LB of MQTT traffic, as you can't "pin" the connection based on the MQTT Topic.您无法对 MQTT 流量进行正常的 LB,因为您无法基于 MQTT 主题“固定”连接。 If you send in a SUBSCRIBE to Broker1 for Topic "test/blatt/#", but the next client PUBLISHes to Broker2 "test/blatt/foo", then if the two brokers are not bridged, your first subscriber will never get that message.如果您向 Broker1 发送主题“test/blatt/#”的 SUBSCRIBE,但下一个客户端 PUBLISHes 到 Broker2“test/blatt/foo”,那么如果两个代理没有桥接,您的第一个订阅者将永远不会收到该消息.

If your clients are terminating the TCP connection sometime after the CONNECT, or the HAproxy is round-robin'ing the packets between the two brokers, you will get errors like this.如果您的客户端在 CONNECT 之后的某个时间终止 TCP 连接,或者 HAproxy 在两个代理之间循环数据包,您将收到这样的错误。 You need to somehow persist the connections, and I don't know how you do that with HAproxy.您需要以某种方式保持连接,我不知道您如何使用 HAproxy 做到这一点。 Non-free LB's like A10 Thunder or F5 LTM can persist TCP connections...but you still need the MQTT brokers bridged for it all to work.像 A10 Thunder 或 F5 LTM 这样的非免费 LB 可以保持 TCP 连接......但您仍然需要桥接 MQTT 代理才能使其正常工作。

Turns out I was running out of resources on my computer.原来我的电脑上的资源用完了。

I moved simulator to another machine and managed to get 15k connections running.我将模拟器移到另一台机器上,并设法让 15k 连接运行。 Due to resource limits I cant get more than that.由于资源限制,我不能得到更多。 Computer thats running the serverside uses 20/32GB of RAM and the computer running simulator used 32/32GB for approx 15k devices.运行服务器端的计算机使用 20/32GB 的 RAM,运行模拟器的计算机使用 32/32GB 用于大约 15k 设备。 Now I see why running both on the same computer is not an option.现在我明白为什么不能在同一台计算机上运行两者了。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM