簡體   English   中英

Java:從FTP下載.Zip文件並提取內容,而無需將文件保存在本地系統上

[英]Java: Downloading .Zip files from an FTP and extracting the contents without saving the files on local system

我有一個要求,我需要從FTP服務器下載某些.Zip文件,並將存檔內容(內容是一些XML文件)推送到HDFS(Hadoop分布式文件系統) 因此,到目前為止,我正在使用acpache FTPClient連接到FTP服務器並將文件首先下載到本地計算機。 稍后將其解壓縮並給出方法的文件夾路徑,該方法將迭代本地文件夾並將文件推送到HDFS。 為了便於理解,我還在下面附加了一些代碼片段。

 //Gives me an active FTPClient
    FTPClient ftpCilent = getActiveFTPConnection();
    ftpCilent.changeWorkingDirectory(remoteDirectory);

    FTPFile[] ftpFiles = ftpCilent.listFiles();
    if(ftpFiles.length <= 0){
    logger.info("Unable to find any files in given location!!");
    return;
    }
    //Iterate files
    for(FTPFile eachFTPFile : ftpFiles){
        String ftpFileName = eachFTPFile.getName();

        //Skips files if not .zip files
        if(!ftpFileName.endsWith(".zip")){
           continue;
        }

    System.out.println("Reading File -->" + ftpFileName);
    /*
     * location is the path on local system given by user
     * usually loaded by a property file.
     *
     * Create a archiveLocation where archived files are
     * downloaded from FTP.
     */
    String archiveFileLocation = location + File.separator + ftpFileName;
    String localDirName = ftpFileName.replaceAll(".zip", "");
    /*
     * localDirLocation is the location where a folder is created
     * by the name of the archive in the FTP and the files are copied to
     * respective folders.
     *
     */
    String localDirLocation = location + File.separator + localDirName;
    File localDir = new File(localDirLocation);
    localDir.mkdir();

    File archiveFile = new File(archiveFileLocation);

    FileOutputStream archiveFileOutputStream = new FileOutputStream(archiveFile);

    ftpCilent.retrieveFile(ftpFileName, archiveFileOutputStream);
    archiveFileOutputStream.close();

    //Delete the archive file after coping it's contents
    FileUtils.forceDeleteOnExit(archiveFile);

    //Read the archive file from archiveFileLocation.       
    ZipFile zip = new ZipFile(archiveFileLocation);
    Enumeration entries = zip.entries();

    while(entries.hasMoreElements()){
    ZipEntry entry = (ZipEntry)entries.nextElement();

    if(entry.isDirectory()){
        logger.info("Extracting directory " + entry.getName());
        (new File(entry.getName())).mkdir();
        continue;
    }

    logger.info("Extracting File: " + entry.getName());
    IOUtils.copy(zip.getInputStream(entry), new FileOutputStream(
    localDir.getAbsolutePath() + File.separator + entry.getName()));
    }

    zip.close();
   /*
    * Iterates the folder location provided and load the files to HDFS
    */    
    loadFilesToHDFS(localDirLocation);
    }
    disconnectFTP();

現在,這種方法的問題在於,該應用程序花費大量時間將文件下載到本地路徑,將其解壓縮,然后將其加載到HDFS。 有沒有更好的方法可以即時從FTP提取Zip的內容,並將內容流直接提供給loadFilesToHDFS()方法,而不是本地系統的路徑?

使用壓縮流。 參見此處: http : //www.oracle.com/technetwork/articles/java/compress-1565076.html

具體請參見代碼示例1。

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM