简体   繁体   English

如果 HDFS 中不存在目录,如何使用 pyspark 动态创建目录并设置文件和目录权限

[英]How to create directory dynamically if it doesn't exist in HDFS by using pyspark and set file and directory permission as well

I am new to Hadoop can we create directory in hadoop dyanamically?我是 Hadoop 的新手,我们可以动态地在 hadoop 中创建目录吗?

currently I am using below command:目前我正在使用以下命令:

hadoop fs -mkdir -p /data/test1/test2/test3/

and setting the file permission by using below command:并使用以下命令设置文件权限:

hdfs dfs -chmod -R 777 /path /data/test1/test2/test3/t_bill_sheet.csv

By Dyanamically I mean {year} and iteratively inside it folder by date like 5,6,7 etc.动态我的意思是 {year} 并按日期在它的文件夹中迭代,如 5、6、7 等。

Thanks in Advance提前致谢

You can define a bash variable (or compute from current dates if you want) then reuse it over and over您可以定义一个 bash 变量(或根据需要从当前日期计算)然后一遍又一遍地重用它

YEAR=2000
MONTH=03
DAY=01
PATH="/data/$YEAR/$MONTH/$DAY"

hadoop fs -mkdir -p "$PATH"
hdfs dfs -chmod -R 777 /path "$PATH/t_bill_sheet.csv"

you can do it using a combination of exists() and mkdirs() methon in pyspark as below您可以在 pyspark 中使用 exists() 和 mkdirs() 方法的组合,如下所示

fs = spark._jvm.org.apache.hadoop.fs.FileSystem.get(spark._jsc.hadoopConfiguration())

if not fs.exists(sc._jvm.org.apache.hadoop.fs.Path("path")): #returns true or false
    fs.mkdirs(spark._jvm.org.apache.hadoop.fs.Path("path"), FsPermission(777: Short) permission)

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如果目录中不存在 CSV 文件,如何在 Python 脚本中创建文件? - How to create a CSV file in Python script if it doesn't exist in the directory? FileNotFoundError:[Errno 2] 没有这样的文件或目录:'hdfs':'hdfs' 在 crontab 中使用子进程 popen - FileNotFoundError: [Errno 2] No such file or directory: 'hdfs': 'hdfs' using subprocess popen in crontab Powershell尝试从不存在的目录中使用python - Powershell is trying to use python from a directory which doesn't exist 如何在不同目录中创建文件 - How to create file in different directory 我正在尝试定义一个可以设置目录的 function,但我不知道为什么会一直显示文件存在错误 - I am trying to define a function that could set a directory, and I don't know why this keep showing file exist error 如何使用python.pathlib.Path在根目录中创建目录? - How to create a directory in root directory using python.pathlib.Path? IOError没有这样的文件或目录,但是文件确实存在 - IOError No such file or directory, but file does exist _tkinter.TclError:图像“ \\ full \\ directory \\ link.gif”不存在 - _tkinter.TclError: image “\full\directory\link.gif” doesn't exist 如果 csv 文件不存在,但如果它已经存在然后写入数据,我该如何创建和写入头文件? - How do I create and write headers to a csv file if it doesn't exist, but if it already exists then write data? Pyspark中的HDFS文件存在检查 - HDFS File Existance check in Pyspark
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM