PHP session lock issue with MemcacheD

Nginx throws 502 Bad Gateway on session_start() called within an include php script.

PHP session storage is handled by MemcacheD

# nginx -v
nginx version: nginx/1.4.6 (Ubuntu)

# php5-fpm -v
PHP 5.5.9-1ubuntu4.14 (fpm-fcgi) (built: Oct 28 2015 01:38:24)
# memcached -h
memcached 1.4.14

# pecl list
Installed packages, channel pecl.php.net:
Package   Version State
memcached 2.1.0   stable

# php -c /etc/php5/fpm/php.ini -i | grep session
session.auto_start => Off => Off
session.cache_expire => 180 => 180
session.cache_limiter => nocache => nocache
session.cookie_domain => no value => no value
session.cookie_httponly => Off => Off
session.cookie_lifetime => 0 => 0
session.cookie_path => / => /
session.cookie_secure => Off => Off
session.entropy_file => /dev/urandom => /dev/urandom
session.entropy_length => 32 => 32
session.gc_divisor => 1000 => 1000
session.gc_maxlifetime => 1440 => 1440
session.gc_probability => 0 => 0
session.hash_bits_per_character => 5 => 5
session.hash_function => 0 => 0
session.name => PHPSESSID => PHPSESSID
session.referer_check => no value => no value
session.save_handler => memcached => memcached
session.save_path => =>
session.serialize_handler => php => php
session.upload_progress.cleanup => On => On
session.upload_progress.enabled => On => On
session.upload_progress.freq => 1% => 1%
session.upload_progress.min_freq => 1 => 1
session.upload_progress.prefix => upload_progress_ => upload_progress_
session.use_cookies => On => On
session.use_only_cookies => On => On
session.use_strict_mode => Off => Off
session.use_trans_sid => 0 => 0

# php -c /etc/php5/fpm/php.ini -i | grep memcached
memcached support => enabled
libmemcached version => 1.0.8
memcached.compression_factor => 1.3 => 1.3
memcached.compression_threshold => 2000 => 2000
memcached.compression_type => fastlz => fastlz
memcached.serializer => php => php
memcached.sess_binary => no value => no value
memcached.sess_lock_wait => 150000 => 150000
memcached.sess_locking => 1 => 1
memcached.sess_prefix => memc.sess.key. => memc.sess.key.
Registered save handlers => files user memcached
session.save_handler => memcached => memcached

While digging through system calls I have found a probable cause for the Bad Gateway.

Stracing the php5-fpm process I get tons of these.

# strace -p 12927 -ff -tt
Process 12927 attached
11:13:01.205991 restart_syscall(<... resuming interrupted call ...>) = 0
11:13:01.309243 sendto(6, "add memc.sess.key.lock.oqr9vso3a"..., 69, MSG_DONTWAIT|MSG_NOSIGNAL, NULL, 0) = 69
11:13:01.309411 recvfrom(6, "NOT_STORED\r\n", 8196, MSG_DONTWAIT, NULL, NULL) = 12
11:13:01.309535 nanosleep({0, 150000000}, NULL) = 0
11:13:01.459913 sendto(6, "add memc.sess.key.lock.oqr9vso3a"..., 69, MSG_DONTWAIT|MSG_NOSIGNAL, NULL, 0) = 69
11:13:01.460049 recvfrom(6, "NOT_STORED\r\n", 8196, MSG_DONTWAIT, NULL, NULL) = 12
11:13:01.460118 nanosleep({0, 150000000}, NULL) = 0
11:13:01.610353 sendto(6, "add memc.sess.key.lock.oqr9vso3a"..., 69, MSG_DONTWAIT|MSG_NOSIGNAL, NULL, 0) = 69
11:13:01.610480 recvfrom(6, "NOT_STORED\r\n", 8196, MSG_DONTWAIT, NULL, NULL) = 12
11:13:01.610521 nanosleep({0, 150000000}, NULL) = 0
11:13:01.760785 sendto(6, "add memc.sess.key.lock.oqr9vso3a"..., 69, MSG_DONTWAIT|MSG_NOSIGNAL, NULL, 0) = 69
11:13:01.760944 recvfrom(6, "NOT_STORED\r\n", 8196, MSG_DONTWAIT, NULL, NULL) = 12
11:13:01.761064 nanosleep({0, 150000000}, NULL) = 0
11:13:01.911438 sendto(6, "add memc.sess.key.lock.oqr9vso3a"..., 69, MSG_DONTWAIT|MSG_NOSIGNAL, NULL, 0) = 69
11:13:01.911575 recvfrom(6, "NOT_STORED\r\n", 8196, MSG_DONTWAIT, NULL, NULL) = 12
11:13:01.911643 nanosleep({0, 150000000}, NULL) = 0
11:13:02.061920 sendto(6, "add memc.sess.key.lock.oqr9vso3a"..., 69, MSG_DONTWAIT|MSG_NOSIGNAL, NULL, 0) = 69
11:13:02.062088 recvfrom(6, "NOT_STORED\r\n", 8196, MSG_DONTWAIT, NULL, NULL) = 12
11:13:02.062211 nanosleep({0, 150000000}, NULL) = 0
11:13:02.212470 sendto(6, "add memc.sess.key.lock.oqr9vso3a"..., 69, MSG_DONTWAIT|MSG_NOSIGNAL, NULL, 0) = 69
11:13:02.212611 recvfrom(6, "NOT_STORED\r\n", 8196, MSG_DONTWAIT, NULL, NULL) = 12
11:13:02.212693 nanosleep({0, 150000000}, NULL) = 0
11:13:02.362917 sendto(6, "add memc.sess.key.lock.oqr9vso3a"..., 69, MSG_DONTWAIT|MSG_NOSIGNAL, NULL, 0) = 69
11:13:02.362999 recvfrom(6, 0x2967068, 8196, 64, 0, 0) = -1 EAGAIN (Resource temporarily unavailable)
11:13:02.363065 poll([{fd=6, events=POLLIN}], 1, 5000) = 1 ([{fd=6, revents=POLLIN}])
11:13:02.363196 recvfrom(6, "NOT_STORED\r\n", 8196, MSG_DONTWAIT, NULL, NULL) = 12
11:13:02.363241 nanosleep({0, 150000000}, NULL) = 0
11:13:02.513457 sendto(6, "add memc.sess.key.lock.oqr9vso3a"..., 69, MSG_DONTWAIT|MSG_NOSIGNAL, NULL, 0) = 69
11:13:02.513531 recvfrom(6, 0x2967068, 8196, 64, 0, 0) = -1 EAGAIN (Resource temporarily unavailable)
11:13:02.513581 poll([{fd=6, events=POLLIN}], 1, 5000) = 1 ([{fd=6, revents=POLLIN}])
11:13:02.513619 recvfrom(6, "NOT_STORED\r\n", 8196, MSG_DONTWAIT, NULL, NULL) = 12
11:13:02.513651 nanosleep({0, 150000000}, ^CProcess 12927 detached

Which causes an endless loop until nginx patience is over and it throws 502 error.

The same output from the stracing the memcached process.

As far as I understand there is already a session with such identifier and when memcached tries to add the same key, it returns NOT_STORED which leads timeout...

Any hint where should I dig further to find a solution?

Many thanks!

Turns out the PHP code contained a recursive function which caused a nesting loop resulting in php error (which was hidden due to production environment and popped up in xdebug only when database was replicated to development)

Fatal error: Maximum function nesting level of '100' reached, aborting! in...

Fixing the php error gets me further in development but does not solve the initial question - why the endless NOT_STORED messages upon calling session_start()

in the php manual session part i find this

void session_write_close ( void ) End the current session and store session data. Session data is usually stored after your script terminated without the need to call session_write_close(), but as session data is locked to prevent concurrent writes only one script may operate on a session at any time. When using framesets together with sessions you will experience the frames loading one by one due to this locking. You can reduce the time needed to load all the frames by ending the session as soon as all changes to session variables are done.

did you finally solve the issue?

