简体   繁体   中英

System call time out?

I'm using unix system() calls to gunzip and gzip files. With very large files sometimes (ie on the cluster compute node) these get aborted, while other times (ie on the login nodes) they go through. Is there some soft limit on the time a system call may take? What else could it be?

The calling thread should block indefinitely until the task you initiated with system() completes. If what you are observing is that the call returns and the file operation as not completed it is an indication that the spawned operation failed for some reason.

What does the return value indicate?

Almost certainly not a problem with use of system(), but with the operation you're performing. Always check the return value, but even more so, you'll want to see the output of the command you're calling. For non-interactive use, it's often best to write stdout and stderr to log files. One way to do this is to write a wrapper script that checks for the underlying command, logs the commandline, redirects stdout and stderr (and closes stdin if you want to be careful), then execs the commandline. Run this via system() rather than the OS command directly.

My bet is that the failing machines have limited disk space, or are missing either the target file or the actual gzip/gunzip commands.

I'm using unix system() calls to gunzip and gzip files.

Probably silly question: why not use zlib directly from your application?

And system() isn't a system call. It is a wrapper for fork()/exec()/wait(). Check the system() man page. If it doesn't unblock, it might be that your application interferes somehow with wait() - eg do you have a SIGCHLD handler?

If it's a Linux system I would recommend using strace to see what's going on and which syscall blocks.

You can even attach strace to already running processes: # strace -p $PID

Sounds like I'm running into the same intermittent issue indicating a timeout of some kind. My script runs every day. I'm starting to believe GZIP has a timeout.

  1. gzip -vd filename.txt.gz 2>> tmp/errorcatch.txt 1>> logfile.log
  2. stderr: Error for filename.txt.gz
  3. Moves to next command 'cp filename* new/directory/', resulting in zipped version of filename in new directory
  4. stdout from earlier gzip showing successful unzip of SAME file: filename.txt.gz: 95.7% -- replaced with filename.txt
  5. Successful out file from gzip is not there in source or new directory.
  6. Following alerts, manual run of 'gzip -vd filename.txt.gz' never fails.

Details:

  • Only one call in script to unzip that file
  • Call for unzip is inside a function (for more rebust logging and alerting)
  • Unable to strace in production
  • Unable to replicate locally
  • In occurences over last month, found no consistency among file size, only

I'll simply be working around it with a retry logic and general scripting improvements, but I want the next google-er to know they're not crazy. This is happening to other people!

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM