[英]is it right to limit cleaning /tmp each day in hadoop cluster
We have HDP cluster version – 2.6.4我们有 HDP 集群版本 – 2.6.4
Cluster installed on redhat machines version – 7.2集群安装在 redhat 机器版本 – 7.2
We noticed about the following issue on the JournalNodes machines ( master machines )我们在 JournalNodes 机器(主机)上注意到以下问题
We have 3 JournalNodes machines , and under /tmp folder we have thousands of empty folders as我们有 3 台 JournalNodes 机器,在 /tmp 文件夹下我们有数千个空文件夹作为
drwx------. 2 hive hadoop 6 Dec 20 09:00 a962c02e-4ed8-48a0-b4bb-79c76133c3ca_resources
an also a lot of folders as还有很多文件夹
drwxr-xr-x. 4 hive hadoop 4096 Dec 12 09:02 hadoop-unjar6426565859280369566
with content as内容为
beeline-log4j.properties BeeLine.properties META-INF org sql-keywords.properties
/tmp should be purged every 10 days according to the configuration file: /tmp 应根据配置文件每 10 天清除一次:
more /usr/lib/tmpfiles.d/tmp.conf
# This file is part of systemd.
#
# systemd is free software; you can redistribute it and/or modify it
# under the terms of the GNU Lesser General Public License as published by
# the Free Software Foundation; either version 2.1 of the License, or
# (at your option) any later version.
# See tmpfiles.d(5) for details
# Clear tmp directories separately, to make them easier to override
v /tmp 1777 root root 10d
v /var/tmp 1777 root root 30d
# Exclude namespace mountpoints created with PrivateTmp=yes
x /tmp/systemd-private-%b-*
X /tmp/systemd-private-%b-*/tmp
x /var/tmp/systemd-private-%b-*
X /var/tmp/systemd-private-%b-*/tmp
You have new mail in /var/spool/mail/root
So we decrease the retention to 1d instead of 10d in order to avoid this issue所以我们将保留减少到1d而不是10d以避免这个问题
Then indeed /tmp have only folders content of one day然后确实/tmp只有一天的文件夹内容
But I want to ask the following questions但是我想问以下问题
Is it ok to configure the retention about /tmp in Hadoop cluster to 1day ?是否可以将 Hadoop 集群中关于 /tmp 的保留时间配置为 1 天?
( I almost sure it ok , but want to hear more opinions ) (我几乎可以肯定没问题,但想听听更多意见)
Second第二
Why HIVE generate thousands of empty folders as XXXX_resources ,为什么 HIVE 会生成数千个空文件夹作为 XXXX_resources ,
and is it possible to solve it from HIVE service , instead to limit the retention on /tmp是否有可能从 HIVE 服务中解决它,而不是限制 /tmp 上的保留
It is quite normal having thousands of folders in /tmp as long as there is still free space available for normal run.只要仍有可用空间可用于正常运行,在 /tmp 中有数千个文件夹是很正常的。 Many processes are using /tmp, including Hive, Pig, etc. One day retention period of /tmp maybe too small, because normally Hive or other map-reduce tasks can run more than one day, though it depends on your tasks.许多进程都在使用/tmp,包括Hive、Pig 等。/tmp 的一天保留期可能太小了,因为通常Hive 或其他map-reduce 任务可以运行一天以上,尽管这取决于您的任务。 HiveServer should remove temp files but when tasks fail or aborted, the files may remain, also it depend on Hive version. HiveServer 应该删除临时文件,但是当任务失败或中止时,这些文件可能会保留,这也取决于 Hive 版本。 Better to configure some retention, because when there is no space left in /tmp, everything stops working.最好配置一些保留,因为当 /tmp 中没有剩余空间时,一切都会停止工作。
Read also this Jira about HDFS scratch dir retention.另请阅读有关 HDFS 暂存目录保留的Jira 。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.