[英]How to append text files in HDFS using Hadoop client using Scala?
I want to write text files into HDFS. 我想将文本文件写入HDFS。 The path to which files has to be written to HDFS is dynamically generated.
动态生成必须将文件写入HDFS的路径。 If a file path(including file name) is new, then the file should be created and text should be written to it.
如果文件路径(包括文件名)是新的,则应创建文件并向其中写入文本。 If the file path(including file) already exists, then the string must be appended to the existing file.
如果文件路径(包括文件)已经存在,则必须将字符串附加到现有文件之后。
I used the following code. 我用下面的代码。 File creation is working fine.
文件创建工作正常。 But cannot append text to existing files.
但是不能将文本追加到现有文件中。
def writeJson(uri: String, Json: JValue, time: Time): Unit = {
val path = new Path(generateFilePath(Json, time))
val conf = new Configuration()
conf.set("fs.defaultFS", uri)
conf.set("dfs.replication", "1")
conf.set("dfs.support.append", "true")
conf.set("dfs.client.block.write.replace-datanode-on-failure.enable","false")
val Message = compact(render(Json))+"\n"
try{
val fileSystem = FileSystem.get(conf)
if(fileSystem.exists(path).equals(true)){
println("File exists.")
val outputStream = fileSystem.append(path)
val bufferedWriter = new BufferedWriter(new OutputStreamWriter(outputStream))
bufferedWriter.write(Message.toString)
bufferedWriter.close()
println("Appended to file in path : " + path)
}
else {
println("File does not exist.")
val outputStream = fileSystem.create(path, true)
val bufferedWriter = new BufferedWriter(new OutputStreamWriter(outputStream))
bufferedWriter.write(Message.toString)
bufferedWriter.close()
println("Created file in path : " + path)
}
}catch{
case e:Exception=>
e.printStackTrace()
}
}
Hadoop version : 2.7.0 Hadoop版本:2.7.0
Whenever append has to be done, the following error is generated: 每当必须执行附加操作时,就会产生以下错误:
org.apache.hadoop.ipc.RemoteException(java.lang.ArrayIndexOutOfBoundsException)
org.apache.hadoop.ipc.RemoteException(java.lang.ArrayIndexOutOfBoundsException)
I can see 3 possibilities: 我可以看到3种可能性:
hdfs
which is sitting on your Hadoop cluster, see: https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-hdfs/HDFSCommands.html . hdfs
提供的外部命令,请参阅: https : //hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-hdfs/HDFSCommands.html 。 Or even WebHDFS REST functionality: https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/WebHDFS.html hadoop-hdfs
library http://mvnrepository.com/artifact/org.apache.hadoop/hadoop-hdfs/2.7.1 hadoop-hdfs
库http://mvnrepository.com/artifact/org.apache.hadoop/hadoop-hdfs/2.7.1提供的hdfs API。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.