简体   繁体   English

如何从Hadoop的hdfs文件中删除一些数据

[英]how to delete some data from hdfs file in Hadoop

I'd uploaded 50GB data on Hadoop cluster. 我已经在Hadoop集群上上传了50GB的数据。 But Now i want to delete first row of data file. 但是现在我要删除数据文件的第一行。 This is time consuming if i remove that data & change manually. 如果我删除该数据并手动更改,这将非常耗时。 Then upload it again on HDFS. 然后将其再次上传到HDFS。 Please reply me. 请回复我。

HDFS files are immutable ( for all practical purposes ). HDFS文件是不可变的( 出于所有实际目的 )。

You need to upload the modified file(s). 您需要上传修改后的文件。 You can do the change programatically with a M/R job that does a near-identity transformation, eg. 您可以通过M / R作业以编程方式进行更改,该作业执行近身转换,例如。 running a streaming shell script that does sed , but the gist of it that you need to create new files, HDFS files cannot be edited. 运行运行sed shell脚本,但要创建文件的要点,则无法编辑HDFS文件。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM