简体   繁体   English

如何在C#中以更好的方式提取和处理文件

[英]how to extract and process file in a better way in c#

I have around 5000 files stored in a FTP location.So first i have to download file from FTP ,then i have to un zip the .gz file and finally i have to process the file and push the data to oracle database.i used TamirSSh assembly to retrieve from FTP and ionic.zip to unzip file. 我在FTP位置上存储了大约5000个文件。因此,首先我必须从FTP下载文件,然后必须解压缩.gz文件,最后我必须处理该文件并将数据推送到oracle数据库。我使用了TamirSSh程序集以从FTP和ionic.zip检索到解压缩文件。 But Downloadfile() ,Extractfile() and ProcessFile() methods taking long time to finish.What would be the better way to download ,unzip and process file in c#.This is a console application 但是Downloadfile() ,Extractfile() and ProcessFile()方法需要花费很长时间才能完成。用c#下载,解压缩和处理文件的更好方法是什么。这是一个控制台应用程序

static void Main(string[] args)
    {
        Downloadfile();
    }

private static void Downloadfile()
{
     //Download 5000 file
       Sftp ftp = new Sftp(dtr["FTP_SERVER"].ToString(), dtr["FTP_USER_ID"].ToString(), dtr["FTP_PASSWORD"].ToString());
                                    ftp.Connect<ftp://ftp.connect/>();
                                    System.IO.Directory.CreateDirectory(@localDestnDir);
                                    ArrayList list;
                                    list = ftp.GetFileList(remotepath<ftp://ftp.getfilelist(remotepath/>);
                                    //GExport_EI_DN_G_6542_StarMetroDeiraHotel&Apartment_10.235.155.37_20161120003108.xml.gz
                                    foreach (string item in list)
                                    {
                                        if (item.StartsWith("GExport_") &&(!item.ToUpper().Contains("DUM")))
                                        {
                                            path = item;
                                            //path = "GExport_EI_DN_G_6542_StarMetroDeiraHotel&Apartment_10.235.155.37_20161120003108.xml.gz";
                                            ftp.Get(dtr["REMOTE_FILE_PATH"].ToString() + path, @localDestnDir + "\\" + dtr["SOURCE_PATH"].ToString());
                                            download_location_hw = dtr["LOCAL_FILE_PATH"].ToString();
                                          //  ExtractZipfiles(download_location_hw + "//" + path, dtr["REMOTE_FILE_PATH"].ToString(), dtr["FTP_SERVER"].ToString(), dtr["FTP_USER_ID"].ToString(), dtr["TECH_CODE"].ToString(), dtr["VENDOR_CODE"].ToString());
                                        }
                                    }
                                    ftp.Close();
        //extract 5000 file by using Ionic.zip 
                    Extractfile();
        //then process 5000 files
        ProcessFile();
}

But Downloadfile() ,Extractfile() and ProcessFile() methods taking long time to finish.What would be the better way to download ,unzip and process file in c#.This is a console application 但是Downloadfile(),Extractfile()和ProcessFile()方法需要花费很长时间才能完成。用c#下载,解压缩和处理文件的更好方法是什么。这是一个控制台应用程序

Basically the pipleline download compressed file, extract and process it is fine. pipleline基本上下载压缩文件,提取并处理就可以了。 But while your system is processing it could download the next ones in parallel because network transport is not CPU intensive compared to decompressing and processing. 但是,当您的系统正在处理时,它可以并行下载下一个,因为与解压缩和处理相比,网络传输不占用大量CPU。

One very simple and fast approach is to use Parallel.ForEach in your loop and ExtractFile and ProcessFile in the loop as well. 一种非常简单,快速的方法是在循环中使用Parallel.ForEach ,并在循环中使用ExtractFileProcessFile To sketch this idea: 要勾勒出这个想法:

private static void Downloadfile()
{
    //Download 5000 file
    Sftp ftp = new Sftp(dtr["FTP_SERVER"].ToString(), dtr["FTP_USER_ID"].ToString(), dtr["FTP_PASSWORD"].ToString());
    ftp.Connect<ftp://ftp.connect/>();
    System.IO.Directory.CreateDirectory(@localDestnDir);       
    var list = ftp.GetFileList(remotepath).ToList();

    Parallel.ForEach(list, item => 
        {
            if (item.StartsWith("GExport_") &&(!item.ToUpper().Contains("DUM")))
            {
                path = item;
                //path = "GExport_EI_DN_G_6542_StarMetroDeiraHotel&Apartment_10.235.155.37_20161120003108.xml.gz";
                ftp.Get(dtr["REMOTE_FILE_PATH"].ToString() + path, @localDestnDir + "\\" + dtr["SOURCE_PATH"].ToString());
                download_location_hw = dtr["LOCAL_FILE_PATH"].ToString();
                //  ExtractZipfiles(download_location_hw + "//" + path, dtr["REMOTE_FILE_PATH"].ToString(), dtr["FTP_SERVER"].ToString(), dtr["FTP_USER_ID"].ToString(), dtr["TECH_CODE"].ToString(), dtr["VENDOR_CODE"].ToString());
            }

            //extract file by using Ionic.zip 
             Extractfile(item);   <= Extractfile works on a single file now
            //then process file
            ProcessFile(item);    <= ProcessFile works on a single file now
        });
        ftp.Close();

}

Without seeing all your code it is hard to say, but most likely you could benefit from parallelization. 如果不看所有代码,很难说,但是很可能您将从并行化中受益。 This is now wonderfully easy to do in C#. 现在,这在C#中非常容易实现。 Instead of that foreach loop you are currently using, try something like this: 代替您当前正在使用的foreach循环,请尝试如下操作:

        Parallel.ForEach(list.ToArray(), item => {
            // Download the item with ftp.Get
            // Unzip the file you just downloaded
            // Process the file
        });

The speed benefit of this is that you will be doing the off-line processing of the first files (unzipping, processing) while the computer is also waiting to download the next files. 这样的速度优势是,您将在计算机也等待下载下一个文件的同时,对第一个文件进行脱机处理(解压缩,处理)。

Now, this will try to download several files at once. 现在,这将尝试一次下载多个文件。 That might not be a good idea, because you might overwhelm the FTP server. 那可能不是一个好主意,因为您可能不堪重负FTP服务器。 So another way to do it is download the files one at a time, and then immediately process each in the background while the foreground moves on to downloading another file: 因此,另一种方法是一次下载一个文件,然后在前台继续下载另一个文件时立即在后台处理每个文件:

        Task[] myTasks = new Task[list.Count];
        int i = 0;
        foreach (string item in list)
        {
            // Download the item with ftp.Get and store its location in download_location_hw
            ftp.Get(dtr["REMOTE_FILE_PATH"].ToString() + path, @localDestnDir + "\\" + dtr["SOURCE_PATH"].ToString());
            string download_location_hw = dtr["LOCAL_FILE_PATH"].ToString();
            // Spin off a background task to process the file we just downloaded
            myTasks[i++] = Task.Run(() => {
                // Extract the zip file referred to by  download_location_hw
                // Process the extracted zip file
            });
        }
        Task.WaitAll(myTasks);

For both examples make sure you are using System.Threading.Tasks; 对于这两个示例,请确保您正在using System.Threading.Tasks;

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM