[英]how to delete some data from hdfs file in Hadoop
I'd uploaded 50GB data on Hadoop cluster. 我已经在Hadoop集群上上传了50GB的数据。 But Now i want to delete first row of data file. 但是现在我要删除数据文件的第一行。 This is time consuming if i remove that data & change manually. 如果我删除该数据并手动更改,这将非常耗时。 Then upload it again on HDFS. 然后将其再次上传到HDFS。 Please reply me. 请回复我。
HDFS files are immutable ( for all practical purposes ). HDFS文件是不可变的( 出于所有实际目的 )。
You need to upload the modified file(s). 您需要上传修改后的文件。 You can do the change programatically with a M/R job that does a near-identity transformation, eg. 您可以通过M / R作业以编程方式进行更改,该作业执行近身转换,例如。 running a streaming shell script that does sed
, but the gist of it that you need to create new files, HDFS files cannot be edited. 运行运行sed
的流 shell脚本,但要创建新文件的要点,则无法编辑HDFS文件。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.