简体   繁体   English

循环中的多线程c#

[英]Multithreading within a loop c#

I am making a tool using c# that iterates through a large file directory and extracts certain information. 我正在使用c#创建一个工具,它遍历一个大文件目录并提取某些信息。 The directory is organised by language (LCID), so I want to use multithreading to go through the directory- one thread per language folder. 该目录按语言(LCID)组织,所以我想使用多线程来浏览目录 - 每个语言文件夹一个线程。

My code currently scans through a small number of the files and extracts the required data without multithreading, but on a large scale it will take too long. 我的代码目前扫描少量文件并提取所需的数据而不需要多线程,但是大规模的代码需要很长时间。

I set up a thread within my loop that gets the LCID folders, but got the following error: "no overload for 'HBscan' matches delegate System.threading.threadstart". 我在我的循环中设置了一个获取LCID文件夹的线程,但是出现了以下错误:“'HBscan'没有重载匹配委托System.threading.threadstart”。 From what I read online, I then put my method within a class so I could have parameters, and now there's no errors but the code is not iterating through the files properly. 从我在网上看到的,然后我把我的方法放在一个类中,所以我可以有参数,现在没有错误,但代码没有正确地迭代文件。 It is leaving files out of it's scan. 它将文件从扫描中删除。

I was wondering if anyone could see where I was going wrong with my code that's making it not perform properly? 我想知道是否有人能够看到我的代码在哪里出错,导致它无法正常运行? Thanks. 谢谢。

public static void Main(string[] args)
    {
        //change rootDirectory variable to point to directory which you wish to scan through
        string rootDirectory = @"C:\sample";
        DirectoryInfo dir = new DirectoryInfo(rootDirectory);

        //get the LCIDs from the folders
        string[] filePaths = Directory.GetDirectories(rootDirectory);
        for (int i = 0; i < filePaths.Length; i++)
        {
            string LCID = filePaths[i].Split('\\').Last();
            Console.WriteLine(LCID);

            HBScanner scanner = new HBScanner(new DirectoryInfo(filePaths[i]));
            Thread t1 = new Thread(new ThreadStart(scanner.HBscan));              
            t1.Start();             
        } 

        Console.WriteLine("Scanning through files...");

    }
    public class HBScanner
    {
        private DirectoryInfo DirectoryToScan { get; set; }

        public HBScanner(DirectoryInfo startDir)
        {
            DirectoryToScan = startDir;
        }

        public void HBscan()
        {
            HBscan(DirectoryToScan);
        } 

        public static void HBscan(DirectoryInfo directoryToScan)
        {
            //create an array of files using FileInfo object
            FileInfo[] files;
            //get all files for the current directory
            files = directoryToScan.GetFiles("*.*");
            string asset = "";
            string lcid = "";

            //iterate through the directory and get file details
            foreach (FileInfo file in files)
            {
                String name = file.Name;
                DateTime lastModified = file.LastWriteTime;
                String path = file.FullName;

                //first check the file name for asset id using regular expression
                Regex regEx = new Regex(@"([A-Z][A-Z][0-9]{8,10})\.");
                asset = regEx.Match(file.Name).Groups[1].Value.ToString();

                //get LCID from the file path using regular expression
                Regex LCIDregEx = new Regex(@"sample\\(\d{4,5})");
                lcid = LCIDregEx.Match(file.FullName).Groups[1].Value.ToString();

                //if it can't find it from filename, it looks into xml
                if (file.Extension == ".xml" && asset == "")
                {
                    System.Diagnostics.Debug.WriteLine("File is an .XML");
                    System.Diagnostics.Debug.WriteLine("file.FullName is: " + file.FullName);
                    XmlDocument xmlDoc = new XmlDocument();
                    xmlDoc.Load(path);
                    //load XML file in 

                    //check for <assetid> element
                    XmlNode assetIDNode = xmlDoc.GetElementsByTagName("assetid")[0];
                    //check for <Asset> element
                    XmlNode AssetIdNodeWithAttribute = xmlDoc.GetElementsByTagName("Asset")[0];

                    //if there is an <assetid> element
                    if (assetIDNode != null)
                    {
                        asset = assetIDNode.InnerText;
                    }
                    else if (AssetIdNodeWithAttribute != null) //if there is an <asset> element, see if it has an AssetID attribute
                    {
                        //get the attribute 
                        asset = AssetIdNodeWithAttribute.Attributes["AssetId"].Value;

                        if (AssetIdNodeWithAttribute.Attributes != null)
                        {
                            var attributeTest = AssetIdNodeWithAttribute.Attributes["AssetId"];
                            if (attributeTest != null)
                            {
                                asset = attributeTest.Value;
                            }
                        }
                    }
                }

                Item newFile = new Item
                {
                    AssetID = asset,
                    LCID = lcid,
                    LastModifiedDate = lastModified,
                    Path = path,
                    FileName = name
                };

                Console.WriteLine(newFile);

            }

            //get sub-folders for the current directory
            DirectoryInfo[] dirs = directoryToScan.GetDirectories("*.*");
            foreach (DirectoryInfo dir in dirs)
            {
                HBscan(dir);
            }
        }
    }

I havent checked, but i think this could work. 我没有检查,但我认为这可行。

The code will create one scanner per thread and perform the HBscan method. 代码将为每个线程创建一个扫描程序并执行HBscan方法。

public static void Main(string[] args)
        {
            //change rootDirectory variable to point to directory which you wish to scan through
            string rootDirectory = @"C:\sample";
            DirectoryInfo dir = new DirectoryInfo(rootDirectory);

            //get the LCIDs from the folders
            string[] filePaths = Directory.GetDirectories(rootDirectory);
            for (int i = 0; i < filePaths.Length; i++)
            {
                string LCID = filePaths[i].Split('\\').Last();
                Console.WriteLine(LCID);

                Thread t1 = new Thread(() => new HBScanner(new DirectoryInfo(filePaths[i])).HBscan());
                t1.Start();
            }

            Console.WriteLine("Scanning through files...");

        }
        public class HBScanner
        {
            private DirectoryInfo DirectoryToScan { get; set; }

            public HBScanner(DirectoryInfo startDir)
            {
                DirectoryToScan = startDir;
            }

            public void HBscan()
            {
                HBscan(DirectoryToScan);
            }

            public static void HBscan(DirectoryInfo directoryToScan)
            {
                //create an array of files using FileInfo object
                FileInfo[] files;
                //get all files for the current directory
                files = directoryToScan.GetFiles("*.*");
                string asset = "";
                string lcid = "";

                //iterate through the directory and get file details
                foreach (FileInfo file in files)
                {
                    String name = file.Name;
                    DateTime lastModified = file.LastWriteTime;
                    String path = file.FullName;

                    //first check the file name for asset id using regular expression
                    Regex regEx = new Regex(@"([A-Z][A-Z][0-9]{8,10})\.");
                    asset = regEx.Match(file.Name).Groups[1].Value.ToString();

                    //get LCID from the file path using regular expression
                    Regex LCIDregEx = new Regex(@"sample\\(\d{4,5})");
                    lcid = LCIDregEx.Match(file.FullName).Groups[1].Value.ToString();

                    //if it can't find it from filename, it looks into xml
                    if (file.Extension == ".xml" && asset == "")
                    {
                        System.Diagnostics.Debug.WriteLine("File is an .XML");
                        System.Diagnostics.Debug.WriteLine("file.FullName is: " + file.FullName);
                        XmlDocument xmlDoc = new XmlDocument();
                        xmlDoc.Load(path);
                        //load XML file in 

                        //check for <assetid> element
                        XmlNode assetIDNode = xmlDoc.GetElementsByTagName("assetid")[0];
                        //check for <Asset> element
                        XmlNode AssetIdNodeWithAttribute = xmlDoc.GetElementsByTagName("Asset")[0];

                        //if there is an <assetid> element
                        if (assetIDNode != null)
                        {
                            asset = assetIDNode.InnerText;
                        }
                        else if (AssetIdNodeWithAttribute != null) //if there is an <asset> element, see if it has an AssetID attribute
                        {
                            //get the attribute 
                            asset = AssetIdNodeWithAttribute.Attributes["AssetId"].Value;

                            if (AssetIdNodeWithAttribute.Attributes != null)
                            {
                                var attributeTest = AssetIdNodeWithAttribute.Attributes["AssetId"];
                                if (attributeTest != null)
                                {
                                    asset = attributeTest.Value;
                                }
                            }
                        }
                    }

                    Item newFile = new Item
                    {
                        AssetID = asset,
                        LCID = lcid,
                        LastModifiedDate = lastModified,
                        Path = path,
                        FileName = name
                    };

                    Console.WriteLine(newFile);

                }

                //get sub-folders for the current directory
                DirectoryInfo[] dirs = directoryToScan.GetDirectories("*.*");
                foreach (DirectoryInfo dir in dirs)
                {
                    HBscan(dir);
                }
            }
        }

If you are using .NET 4.0, you could Use TPL and use Parallel.For/Parallel.ForEach to work on multiple items at the same time fairly easy. 如果您使用的是.NET 4.0,则可以使用TPL并使用Parallel.For / Parallel.ForEach同时处理多个项目相当容易。

I just got in touch with it a few days before and it's very interesting. 几天前我刚接触过它,非常有趣。 It gives you great performance by using multiple threads on different cores to speed up your working. 它通过在不同内核上使用多个线程来加速您的工作,从而为您提供出色的性能。 Of cause this might be limited in your case due exessive IO accesses. 由于IO访问过多,因此在您的情况下可能会受到限制。

But it may be worth a try! 但它可能值得一试! (And altering your current source is fairly easy done to just check it out) (改变你当前的来源是相当容易的,只是检查出来)

What about something a little more like this, 怎么样更像这样的东西,

public static void Main(string[] args)
{
    const string rootDirectory = @"C:\sample";

    Directory.EnumerateDirectories(rootDirectory)
        .AsParallel()
        .ForAll(f => HBScannner.HBScan(new DirectoryInfo(f)));
}

After all, you only get the LCID within the loop body to write it to the console. 毕竟,您只能在循环体内获取LCID以将其写入控制台。 If you want to maintian the writing to the console you could do, 如果你想要将控制权写入你可以做到的,

public static void Main(string[] args)
{
    const string rootDirectory = @"C:\sample";

    Console.WriteLine("Scanning through files...");

    Directory.EnumerateDirectories(rootDirectory)
        .AsParallel()
        .ForAll(f => 
            {
                var lcid = f.Split('\\').Last();
                Console.WriteLine(lcid);

                HBScannner.HBScan(new DirectoryInfo(f));
            });
}

Note that the use of EnumerateDirectories should be preferred over GetDirectories since it is lazy evaluted so your processing can start a soon as the first directory is found. 请注意,使用EnumerateDirectories应优先于GetDirectories因为它是惰性评估的,因此您的处理可以在找到第一个目录后立即开始。 You don't have to wait for all directories to be loaded into a list. 您不必等待将所有目录加载到列表中。

Your task could be much improved using BlockingCollection http://msdn.microsoft.com/en-us/library/dd267312.aspx . 使用BlockingCollection可以大大改善您的任务http://msdn.microsoft.com/en-us/library/dd267312.aspx

The overall structure is this: you create one thread (or do this in the main thread), that will enumerate files and add them into BlockingCollection. 整体结构是这样的:你创建一个线程(或在主线程中执行此操作),它将枚举文件并将它们添加到BlockingCollection中。 Simply enumerating files, should be fairly fast, and this thread should complete much faster then worker threads. 简单地枚举文件,应该相当快,并且这个线程应该比工作线程更快地完成。

Then, you create a number of Tasks (the same number as Environment.ProcessorCount would be good). 然后,您创建了许多任务(与Environment.ProcessorCount相同的数字将是好的)。 Those tasks should be like in the first sample from docs (collection.Take()). 这些任务应该与docs的第一个示例(collection.Take())类似。 Tasks should perform a check on one individual file. 任务应该对一个单独的文件执行检查。

So it will result, that one thread is looking for file names and putting them in BlockingCollection, and other threads in parallel will check file contents. 因此,一个线程正在寻找文件名并将它们放入BlockingCollection中,而其他并行线程将检查文件内容。 This way you'll have better parallelism, because if you create threads for folders, this may create uneven work distribution (you don't know have many files are in every folder, right?) 这样你就可以获得更好的并行性,因为如果你为文件夹创建线程,这可能会造成不均匀的工作分配(你不知道每个文件夹中都有很多文件,对吧?)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM