简体   繁体   English

在 hadoop 集群中每天限制清理 /tmp 是否正确

[英]is it right to limit cleaning /tmp each day in hadoop cluster

We have HDP cluster version – 2.6.4我们有 HDP 集群版本 – 2.6.4

Cluster installed on redhat machines version – 7.2集群安装在 redhat 机器版本 – 7.2

We noticed about the following issue on the JournalNodes machines ( master machines )我们在 JournalNodes 机器(主机)上注意到以下问题

We have 3 JournalNodes machines , and under /tmp folder we have thousands of empty folders as我们有 3 台 JournalNodes 机器,在 /tmp 文件夹下我们有数千个空文件夹作为

drwx------.  2 hive      hadoop     6 Dec 20 09:00 a962c02e-4ed8-48a0-b4bb-79c76133c3ca_resources

an also a lot of folders as还有很多文件夹

drwxr-xr-x.  4 hive      hadoop  4096 Dec 12 09:02 hadoop-unjar6426565859280369566

with content as内容为

beeline-log4j.properties  BeeLine.properties  META-INF  org  sql-keywords.properties

/tmp should be purged every 10 days according to the configuration file: /tmp 应根据配置文件每 10 天清除一次:

more  /usr/lib/tmpfiles.d/tmp.conf
#  This file is part of systemd.
#
#  systemd is free software; you can redistribute it and/or modify it
#  under the terms of the GNU Lesser General Public License as published by
#  the Free Software Foundation; either version 2.1 of the License, or
#  (at your option) any later version.

# See tmpfiles.d(5) for details

# Clear tmp directories separately, to make them easier to override
v /tmp 1777 root root 10d
v /var/tmp 1777 root root 30d

# Exclude namespace mountpoints created with PrivateTmp=yes
x /tmp/systemd-private-%b-*
X /tmp/systemd-private-%b-*/tmp
x /var/tmp/systemd-private-%b-*
X /var/tmp/systemd-private-%b-*/tmp
You have new mail in /var/spool/mail/root

So we decrease the retention to 1d instead of 10d in order to avoid this issue所以我们将保留减少到1d而不是10d以避免这个问题

Then indeed /tmp have only folders content of one day然后确实/tmp只有一天的文件夹内容

But I want to ask the following questions但是我想问以下问题

Is it ok to configure the retention about /tmp in Hadoop cluster to 1day ?是否可以将 Hadoop 集群中关于 /tmp 的保留时间配置为 1 天?

( I almost sure it ok , but want to hear more opinions ) (我几乎可以肯定没问题,但想听听更多意见)

Second第二

Why HIVE generate thousands of empty folders as XXXX_resources ,为什么 HIVE 会生成数千个空文件夹作为 XXXX_resources ,

and is it possible to solve it from HIVE service , instead to limit the retention on /tmp是否有可能从 HIVE 服务中解决它,而不是限制 /tmp 上的保留

It is quite normal having thousands of folders in /tmp as long as there is still free space available for normal run.只要仍有可用空间可用于正常运行,在 /tmp 中有数千个文件夹是很正常的。 Many processes are using /tmp, including Hive, Pig, etc. One day retention period of /tmp maybe too small, because normally Hive or other map-reduce tasks can run more than one day, though it depends on your tasks.许多进程都在使用/tmp,包括Hive、Pig 等。/tmp 的一天保留期可能太小了,因为通常Hive 或其他map-reduce 任务可以运行一天以上,尽管这取决于您的任务。 HiveServer should remove temp files but when tasks fail or aborted, the files may remain, also it depend on Hive version. HiveServer 应该删除临时文件,但是当任务失败或中止时,这些文件可能会保留,这也取决于 Hive 版本。 Better to configure some retention, because when there is no space left in /tmp, everything stops working.最好配置一些保留,因为当 /tmp 中没有剩余空间时,一切都会停止工作。

Read also this Jira about HDFS scratch dir retention.另请阅读有关 HDFS 暂存目录保留的Jira

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 在hadoop多节点集群设置中,从作业日志到$ {hadoop.tmp.dir}创建符号链接失败 - Creation of symlink from job logs to ${hadoop.tmp.dir} failed in hadoop multinode cluster setup hadoop.tmp.dir和mapred.temp.dir和mapreduce.cluster.temp.dir有什么区别 - what is the difference between hadoop.tmp.dir and mapred.temp.dir and mapreduce.cluster.temp.dir Eclipse上的Hadoop MapReduce:清理登台区域文件:/app/hadoop/tmp/mapred/staging/myname183880112/.staging/job_local183880112_0001 - Hadoop MapReduce on Eclipse: Cleaning up the staging area file:/app/hadoop/tmp/mapred/staging/myname183880112/.staging/job_local183880112_0001 在/ tmp中设置hadoop.tmp.dir - Setting hadoop.tmp.dir in /tmp 什么是hadoop hdfs中的/ tmp目录? - what is /tmp directory in hadoop hdfs? Hadoop tmp目录变得巨大 - Hadoop tmp directory gets huge hadoop群集的最低硬件,用于处理一天以上的数据 - Minimum Hardware for hadoop cluster for processing more than 1TB data evey day 是否有任何工具可以查找hadoop集群通常在一天的什么时间没有负载并每天在该时间提交作业 - Is there any tool to find at what time of the day hadoop cluster is usually free from load and submit job at that time daily hadoop tmp目录没有空间(设备tmp目录上没有剩余空间) - hadoop tmp directory does not space (No space left on device tmp directory) 安装Hadoop的2节点集群 - Installation of 2 node cluster of Hadoop
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM