简体   繁体   中英

File Transfer to Hadoop HDFS from remote linux server

I need to transfer the Files from remote Linux server to directly HDFS. I have keytab placed on remote server , after kinit command its activated however i cannot browse the HDFS folders. I know from edge nodes i can directly copy files to HDFS however i need to skip the edge node and directly transfer the files to HDFS.

how can we achieve this.

Let's assume a couple of things first. You have one machine on which the external hard drive is mounted (named DISK) and one cluster of machines with an ssh access to the master (we denote by master in the command line the user@hostname part of the master machine). You run the script on the machine with the drive. The data on the drive consists of multiple directories with multiple files in each (like a 100); the numbers don't matter, it's just to justify the loops. The path to the data will be stored in the ${DIR} variable (on Linux, it would be /media/DISK and on Mac OS X /Volumes/DISK). Here is what the script looks like:

DIR=/Volumes/DISK;

for d in $(ls ${DIR}/);
do
  for f in $(ls ${DIR}/${d}/);
  do
    cat ${DIR}/${d}/${f} | ssh master "hadoop fs -put - /path/on/hdfs/${d}/${f}";
  done;
done;

Note that we go over each file and we copy it into a specific file because the HDFS API for put requires that "when source is stdin, destination must be a file."

Unfortunately, it takes forever. When I came back the next morning, it only did a fifth of the data (100GB) and was still running... Basically taking 20 minutes per directory! I ended up going forward with the solution of copying the data temporarily on one of the machines and then copying it locally to HDFS. For space reason, I did it one folder at a time and then deleting the temporarily folder immediately after. Here is what the script looks like:

DIR=/Volumes/DISK;
PTH=/path/on/one/machine/of/the/cluster;
for d in $(ls ${DIR}/);
do
  scp -r -q ${DIR}/${d} master:${PTH}/
  ssh master "hadoop fs -copyFromLocal ${PTH}/${d} /path/on/hdfs/";
  ssh master "rm -rf ${PTH}/${d}";
done;

Hope it helps!

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM