简体   繁体   中英

How to use threading effective way in a console application .net

I have a 8 core systems and i am processing number of text files contains millions of lines say 23 files contain huge number of lines which takes 2 to 3 hours to finish.I am thinking of using TPL task for processing text files.As of now the code which i am using is sequentially processing text files one by one so i am thinking of split it like 5 text files in one thread 5 in another thread etc.Is it a good approach or any other way ? I am using .net 4.0 and code i am using is as shown below

foreach (DataRow dtr in ds.Tables["test"].Rows)
                {
                    string filename = dtr["ID"].ToString() + "_cfg";
                    try
                    {
                        foreach (var file in
                          Directory.EnumerateFiles(Path.GetDirectoryName(dtr["FILE_PATH"].ToString()), "*.txt"))
                        {
                            id = file.Split('\\').Last();
                            if (!id.Contains("GMML"))
                            {
                                strbsc = id.Split('_');
                                id = strbsc[0];
                            }
                            else
                            {
                                strbsc = file.Split('-');
                                id = ("RC" + strbsc[1]).Replace("SC", "");
                            }
                            ProcessFile(file, id, dtr["CODE"].ToString(), dtr["DOR_CODE"].ToString(), dtr["FILE_ID"].ToString());
                        }
                    }

How to split text files in to batches and each batch should run in threads rather one by one.Suppose if 23 files then 7 in one thread 7 in one thread 7 in one thread and 2 in another thread. One more thing is i am moving all these data from text files to oracle database

EDIT

if i use like this will it worth,but how to seperate files in to batches

Task.Factory.StartNew(() => {ProcessFile(file, id, dtr["CODE"].ToString(), dtr["DOR_CODE"].ToString(), dtr["FILE_ID"].ToString()); });

Splitting the file into multiple chunks does not seem to be a good idea because its performance boost is related to how the file is placed on your disk. But because of the async nature of disk IO operations, I strongly recommend async access to the file. There are several ways to do this and you can always choose a combination of those. At the lowest level you can use async methods such as StreamWriter.WriteAsync() or StreamReader.ReadAsync() to access the file on disk and cooperatively let the OS know that it can switch to a new thread for disk IO and let the thread out until the Disk IO operation is finished. While it's useful to make async calls at this level, it alone does not have a significant impact on the overall performance of your application, since your app is still waiting for the disk operation to finish and does nothing in the meanwhile! (These calls can have a big impact on your software's responsiveness when they are called from the UI thread) So, I recommend splitting your software logic into at least two separate parts running on two separate threads; One to read data from the file, and one to process the read data. You can use the provider/consumer pattern to help these threads interact. One great data structure provided by .net is System.Collections.Concurrent.ConcurrentQueue which is specially useful in implementing multithreaded provider/consumer pattern.

So you can easily do something like this:

System.Collections.Concurrent.ConcurrentQueue<string> queue = new System.Collections.Concurrent.ConcurrentQueue<string>();
bool readFinished = false;  
Task tRead = Task.Run(async () => 
{
    using (FileStream fs = new FileStream())
    {
        using (StreamReader re = new StreamReader(fs))
        {
            string line = "";
            while (!re.EndOfStream)
                queue.Enqueue(await re.ReadLineAsync());
        }
    }
});

Task tLogic = Task.Run(async () =>
{
    string data ="";
    while (!readFinished)
    {
        if (queue.TryDequeue(out data))
            //Process data
        else
            await Task.Delay(100);
    }
});

tRead.Wait();
readFinished = true;
tLogic.Wait();

This simple example uses StreamReader.ReadLineAsync() to read data from file, while a good practice can be reading a fixed-length of characters into a char[] buffer and adding that data to the queue. You can find the optimized buffer length after some tests.

所有,真正的瓶颈是当我进行批量插入时,我正在检查插入的数据是否存在于数据库中或什么,我有一个状态列,其中如果存在数据,则为“ Y”或“ N”通过执行update语句,导致在插入时出现拥塞的update语句是罪魁祸首。在数据库中建立索引之后,结果从4小时减少到10分钟,这是有影响的,但它会胜出:)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM