简体   繁体   English

关于如何在C#中从服务器访问大量文件的说明

[英]clarification on how to access numerous files from server in c#

Basically I want to access 1000s of textfiles, input their data, store them in sqlite databases, parse them then show the output to users. 基本上我想访问1000个文本文件,输入它们的数据,将它们存储在sqlite数据库中,解析它们,然后将输出显示给用户。 So far I've developed a program that does this for only ONE textfile. 到目前为止,我已经开发了一个仅对一个文本文件执行此操作的程序。

What I want to do: There is a Directory on our server which has about 15 folders. 我想做什么:我们的服务器上有一个目录,其中包含约15个文件夹。 In each folder there are about 30-50 textfiles. 在每个文件夹中,大约有30-50个文本文件。 I want to Loop through EACH FOLDER, and in each folder, loop through EACH file. 我想遍历每个文件夹,并在每个文件夹中遍历每个文件。 A nice user helped me with doing this for 1000s of textfiles but I needed further clarification his method. 一个不错的用户帮助我完成了数千个文本文件的操作,但是我需要进一步阐明他的方法。 This was his approach: 这是他的方法:

 private static void ReadAllFilesStartingFromDirectory(string topLevelDirectory)
{
    const string searchPattern = "*.txt";
    var subDirectories = Directory.EnumerateDirectories(topLevelDirectory);
    var filesInDirectory = Directory.EnumerateFiles(topLevelDirectory, searchPattern);

    foreach (var subDirectory in subDirectories)
    {
        ReadAllFilesStartingFromDirectory(subDirectory);//recursion
    }

    IterateFiles(filesInDirectory, topLevelDirectory);
}

private static void IterateFiles(IEnumerable<string> files, string directory)
{
    foreach (var file in files)
    {
        Console.WriteLine("{0}", Path.Combine(directory, file));//for verification
        try
        {
            string[] lines = File.ReadAllLines(file);
            foreach (var line in lines)
            {
                //Console.WriteLine(line);   
            }
        }
        catch (IOException ex)
        {
            //Handle File may be in use...                    
        }
    }
}

My problems/questions: 我的问题/问题:

1) topLevelDirectory - what should I exactly put there? 1)topLevelDirectory-我到底应该放在哪里? The 15 folders are located on a server with the format something like this \\servername\\randomfile\\random\\locationoftopleveldirectory. 这15个文件夹位于服务器上,其格式类似于\\ servername \\ randomfile \\ random \\ locationoftoplevel目录。 But how can I put the double slashes (at the begining of the path name) in this? 但是,如何在其中添加双斜杠(在路径名的开头)? Is this possible in c#? 在C#中这可能吗? I thought we could only access local files (example :"c:\\" - paths with single, not double slashes) 我以为我们只能访问本地文件(例如:“ c:\\”-带有单斜杠而不是双斜杠的路径)

2) I dont understand what the purpose of the first foreach loop is. 2)我不明白第一个foreach循环的目的是什么。 "readAllFilesStartingFromDirectory(subDirectory)" , yes we are looping the folders, but we aren't even doing anything with that loop. 是“ readAllFilesStartingFromDirectory(subDirectory)”,是的,我们正在循环文件夹,但在该循环中我们甚至不做任何事情。 It's just reading the folders. 它只是在读取文件夹。

I'm not going to know your top level directory, but essentially if your files are in C:\\tmp, then you would pass it @"C:\\tmp". 我不会知道您的顶级目录,但实际上,如果您的文件位于C:\\ tmp中,则可以将其传递给@“ C:\\ tmp”。 Escape your string with the @ character to get double-slashes (or escape each slash individually). 用@字符转义字符串以获得双斜杠(或分别转义每个斜杠)。

string example0 = @"\\\\some\\network\\path";
string example1 = "\\\\\\\\some\\\\network\\\\path";

With ReadAllFilesStartingFromDirectory you're recursively calling IterateFiles, and it's doing whatever IterateFiles does in each directory. 使用ReadAllFilesStartingFromDirectory,您可以递归调用IterateFiles,它可以完成IterateFiles在每个目录中所做的任何事情。 With the code you pasted above, that happens to be doing nothing, since Console.Writeline(line) is commented out. 使用上面粘贴的代码,由于Console.Writeline(line)被注释掉了,因此它什么也没做。

Lets get to clarify the topLevelDirectory : This is a folder, which has items in it. 让我们澄清一下topLevelDirectory :这是一个文件夹,其中包含项目。 It does not matter if these are files or other directories. 这些是文件还是其他目录都没有关系。 These caontained other "subfolders" can contain folders themselves. 这些固定的其他“子文件夹”可以包含文件夹本身。

What toplevelDirectory means to you: take the folder which encapsulates all your files you need at the lowest level possible. toplevelDirectory对您而言意味着什么:取一个文件夹,其中包含您可能需要的最低级别的所有文件。

Your toplevelfolder is the directory which contains the 15 folders you want to crawl. 您的顶层文件夹是包含要爬网的15个文件夹的目录。

ReadAllFilesStartingFromDirectory(string topLevelDirectory) You need to realise what recursion means. ReadAllFilesStartingFromDirectory(string topLevelDirectory)您需要了解递归的含义。 Recursion describes a method which calls itself. 递归描述了一种调用自身的方法。 Compare the name of the function (ReadAllFilesStartingFromDirectory), with the name of the function called in the foreach loop - they are the same. 将函数名称(ReadAllFilesStartingFromDirectory)与foreach循环中调用的函数名称进行比较-它们是相同的。

In you case: The method gets all folders located in your topfolder. 如果您遇到这种情况:该方法将所有文件夹保存在顶层文件夹中。 He then loops through all subfolders. 然后,他遍历所有子文件夹。 Each subfolder then becomes the toplevel folder, which in turn can contain subfolders, who will become toplevelfolders in the next method call. 然后,每个子文件夹都成为顶层文件夹,该文件夹又可以包含子文件夹,这些子文件夹将在下一个方法调用中成为顶层文件夹。 This is a nice way to loop through the whole file structure. 这是遍历整个文件结构的好方法。 If there are no more subfolders, there won't be any recursion and the method ends. 如果没有更多的子文件夹,将没有任何递归并且方法结束。

Your path problem: You need to mask the backslashes. 您的路径问题:您需要屏蔽反斜杠。 You mask them by adding a backslash in front of them. 您可以通过在它们前面添加反斜杠来掩盖它们。

\\path\\randfolder\\file.txt will become \\\\path\\\\randfolder\\\\file.txt \\path\\randfolder\\file.txt将变为\\\\path\\\\randfolder\\\\file.txt

Or you set an @ before the string. 或者您在字符串前设置一个@。 var path = @"\\path\\randfolder\\file.txt" , which also does the trick for you. var path = @"\\path\\randfolder\\file.txt" ,这也可以帮您解决问题。 Both ways work 两种方式都可以

1) Yes it is possible in C#. 1)是的,在C#中是可能的。 If your program has access permission to the network location you can use: "\\\\\\\\servername\\\\randomfile\\\\random\\\\locationoftopleveldirectory" - double slash in string interpreated as one slash. 如果您的程序具有对网络位置的访问权限,则可以使用:“ \\\\\\\\服务器名\\\\ randomfile \\\\ random \\\\ locationoftopleveldirectory”-字符串中的双斜杠,解释为一个斜杠。 or you can use @ before the string, it means 'ignore escape character' which is slash, then your string will look like this: @"\\\\servername\\randomfile\\random\\locationoftopleveldirectory" 或者您可以在字符串前使用@,这意味着'忽略转义字符'是斜杠,然后您的字符串将如下所示:@“ \\\\ servername \\ randomfile \\ random \\ locationoftopleveldirectory”

2) ReadAllFilesStartingFromDirectory is recursive function. 2)ReadAllFilesStartingFromDirectory是递归函数。 Directories structure is hierarchical, therefore it is easy to traverse them recursively. 目录结构是分层的,因此很容易递归遍历它们。 This function looks for files in the root directory and in its sub-directories and in all their sub directories... Try to put comment on this loop and you will see that only the files of the root directory parsed by the IterateFiles function 该函数在根目录及其子目录及其所有子目录中查找文件...尝试对此循环添加注释,您将看到只有IterateFiles函数解析的根目录文件

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM