简体   繁体   English

分段错误后清空核心转储文件

[英]Empty core dump file after Segmentation fault

I am running a program, and it is interrupted by Segmentation fault. 我正在运行一个程序,它被Segmentation fault中断。 The problem is that the core dump file is created, but of size zero. 问题是核心转储文件已创建,但大小为零。

Have you heard about such a case and how to resolve it? 你听说过这样的案件以及如何解决吗?

I have enough space on the disk. 我在磁盘上有足够的空间。 I have already performed ulimit -c unlimited to unlimit the size of core file - both running it or putting on the top of the submitted batch file - but still have 0 byte core dump files. 我已经执行了ulimit -c unlimited来限制核心文件的大小 - 运行它或者放在提交的批处理文件的顶部 - 但仍然有0字节的核心转储文件。 The permissions of the folder containing these files are uog+rw and the permissions on the core files created are u+rw only. 包含这些文件的文件夹的权限是uog + rw,创建的核心文件的权限仅为u + rw。

The program is written by C++ and submitted on a linux cluster with qsub command of the Grid Engine, I don't know this information is relevant or not to this question. 该程序是由C ++编写的,并使用Grid Engine的qsub命令在linux集群上提交,我不知道这些信息是否与此问题相关。

setting ulimit -c unlimited turned on generation of dumps. 设置ulimit -c unlimited打开转储的生成。 by default core dumps were generated in current directory which was on nfs. 默认情况下,核心转储是在nfs上的当前目录中生成的。 setting /proc/sys/kernel/core_pattern to /tmp/core helped me to solve the problem of empty dumps. /proc/sys/kernel/core_pattern/tmp/core帮我解决了空转储的问题。

The comment from Ranjith Ruban helped me to develop this workaround. Ranjith Ruban评论帮助我开发了这种解决方法。

What is the filesystem that you are using for dumping the core? 您用于转储核心的文件系统是什么?

It sounds like you're using a batch scheduler to launch your executable. 听起来您正在使用批处理调度程序来启动可执行文件。 Maybe the shell that Torque/PBS is using to spawn your job inherits a different ulimit value? 也许Torque / PBS用来生成你的作业的shell继承了不同的ulimit值? Maybe the scheduler's default config is not to preserve core dumps? 也许调度程序的默认配置不是为了保留核心转储?

Can you run your program directly from the command line instead? 您可以直接从命令行运行程序吗?

Or if you add ulimit -c unlimited and/or ulimit -s unlimited to the top of your PBS batch script before invoking your executable, you might be able to override PBS' default ulimit behavior. 或者,如果在调用可执行文件之前将ulimit -c unlimited和/或ulimit -s unlimited到PBS批处理脚本的顶部,则可能会覆盖PBS的默认ulimit行为。 Or adding 'ulimit -c' could report what the limit is anyway. 或者添加'ulimit -c'可以报告限制是什么。

You can set resource limits such as physical memory required by using qsub option such as -l h_vmem=6G to reserver 6 GB of physical memory. 您可以使用qsub选项(例如-l h_vmem=6G来设置资源限制,例如所需的物理内存,以保留6 GB的物理内存。

For file blocks you can set h_fsize to appropriate value as well. 对于文件块,您也可以将h_fsize设置为适当的值。

See RESOURCE LIMITS section of qconf manpage: 请参阅qconf联机帮助页的RESOURCE LIMITS部分:

http://gridscheduler.sourceforge.net/htmlman/htmlman5/queue_conf.html http://gridscheduler.sourceforge.net/htmlman/htmlman5/queue_conf.html

s_cpu     The per-process CPU time limit in seconds.

s_core    The per-process maximum core file size in bytes.

s_data    The per-process maximum memory limit in bytes.

s_vmem    The same as s_data (if both are set the minimum is
           used).
h_cpu     The per-job CPU time limit in seconds.

h_data    The per-job maximum memory limit in bytes.

h_vmem    The same as h_data (if both are set the minimum is
           used).

h_fsize   The total number of disk blocks that this job  can
           create.

Also, if cluster uses local TMPDIR to each node, and that is filling up, you can set TMPDIR to alternate location with more capacity, eg NFS share: 此外,如果群集对每个节点使用本地TMPDIR,并且正在填满,则可以将TMPDIR设置为具有更多容量的备用位置,例如NFS共享:

export TEMPDIR=<some NFS mounted directory>

Then launch qsub with the -V option to export the current environment to the job. 然后使用-V选项启动qsub以将当前环境导出到作业。

One or a combination of the above may help you solve your problem. 上述中的一个或组合可以帮助您解决问题。

If you run the core file in a mounted drive.The core file can't be written to a mounted drive but must be written to the local drive. 如果在已装入的驱动器中运行核心文件。核心文件无法写入已装入的驱动器,但必须写入本地驱动器。

You can copy the file to the local drive. 您可以将文件复制到本地驱动器。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM