简体   繁体   中英

kafka Synchronization :“java.io.IOException: Too many open files”

we are having problem with Kafka. sometimes Suddenly, without warning we go out of Synchronization and start to get exceptions when emitting events.

the exception we are getting is

java.io.IOException: Too many open files

it seems this is a generic exception thrown by Kafka in many cases. We investigated it a little and we think the root cause is when trying to emit events to some topic, it fails because kafka dosen't have a leader partition for this topic

can someone help ?

I assume that you are on Linux. If that is the case, then what's happening is that you are running out of open file descriptors. The real question is why this is happening.

Linux by default generally keeps this number fairly low. You can check the actual value via ulimit:

ulimit -a | grep "open files"

You can then set that value via, again ulimit:

sudo ulimit -n 4096

That said, unless the Kafka host in question has lots of topics / partitions it is unusual to hit that limit. What's probably happening is that some other process is keeping files or connections open. In order to figure out which process you're going to have to do some detective work with lsof.

One of the case this happens is that when you have big partition number, because each partition maps to a directory in file system in broker that consist of two files. one of them is for the index and another is for data. broker opens both files. so more partition numbers there is more open files. as Doomy said you can increase the open files in linux but this config is not permanent and when you close the session this config will disappear. and in next loggin if you check with this command

ulimit -a | grep "open files"

you can see the older number. but with this config you can make it permanent:

open this file:

sudo nano /etc/pam.d/common-session

and add this line:

session required pam_limits.so

after that you can set the limit in limits.config as this:

sudo nano /etc/security/limits.conf

and then you can set limit in this file forexample

* soft nofile 80000

or any hard config. after that close your session and check again the limit of open files

I had similar "java.io.IOException: Too many open files" issue on Linux/CentOS. In my case, after checking open fd's with isof , it was kafka-web-console which was opening too many connections. Stopping that solved my problem.

In our case our Kafka topics were accidentally configured "segment.ms" = 20000 and were generating new log segments every 20 seconds when the default is 604800000 (1 week).

We are using amazon's msk, so we didn't have the ability ourselves to run the commands, however amazon support was able to monitor it for us. That caused this issue, but then some of the nodes were not recovering.

We took two steps..

1) Force Compaction

We set the retention and ratio low to clean up

"delete.retention.ms" = 100
"min.cleanable.dirty.ratio" = "0.01"

One of the nodes, was able to recover... but another did not seem to recover to the point where Kafka would actually run compaction, it seemed to be the "leader" on one of the largest topics.

2) Free up space

We decided to destroy the large topic in the hopes it would unblock the node. Eventually the compaction seemed to run on all the nodes.

Later we restored the topic that we destroyed with the new segmentation settings, and have been running fine since.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM