简体   繁体   English

C#中处理大量图像文件的最快或最有效方法

[英]fastest or most efficient way to process large number of image files in c#

I have to read and find out the compression type of around 450,000 image files stored in our network. 我必须阅读并找出存储在我们网络中的大约450,000个图像文件的压缩类型。 This is what i have got so far and it is working as desired but what i have observed is processing around 2000 files in an hr. 这是我到目前为止所能得到的,并且可以按需要工作,但是我观察到的是每小时处理2000个文件。 can it be more optimized to make it more efficient.One of the reason for slowness can be since the files are read from a shared network location but there is no work around for that. 速度较慢的原因之一可能是因为文件是从共享的网络位置读取的,但目前尚无解决方法。

Any suggestions will be appreciated. 任何建议将不胜感激。

using System;
using System.Configuration;
using System.Diagnostics;
using System.Drawing;
using System.Drawing.Imaging;
using System.Globalization;
using System.Data;
using System.IO;
using FileHelpers;
using System.Threading.Tasks;

namespace CompressionTypeOfEDMSTiffs
{
    public static class ImageProcessor
    {
        private static readonly object lockObject = new object();

        public static void Process()
        {
            string docNumber = "";
            try
            {
                Stopwatch _stopwatch = new Stopwatch();

                _stopwatch.Start();

                var filename = ConfigurationManager.AppSettings["filePath"].Trim();

                var dtCsv = CsvEngine.CsvToDataTable(filename, ',');

                dtCsv.Columns.Add("CompressionType");
                dtCsv.Columns["CompressionType"].SetOrdinal(dtCsv.Columns.Count - 1);

                //One-by one
                //for (var rowNum = 0; rowNum < dtCsv.Rows.Count; rowNum++)
                //{
                //    var imgPath = dtCsv.Rows[rowNum]["Path"].ToString();

                //    if (!string.IsNullOrWhiteSpace(imgPath) && imgPath.LastIndexOf(".") != -1)
                //    {
                //        if (imgPath.Substring(imgPath.LastIndexOf(".")).ToUpper().Equals(".TIF"))
                //        {
                //            docNumber = dtCsv.Rows[rowNum]["DOCNUMBER"].ToString();

                //            var compression = GetCompressionTypeFromImage(imgPath);

                //            dtCsv.Rows[rowNum]["CompressionType"] = compression;

                //            //if (Enum.IsDefined(typeof(CompressionTypes), compression))
                //            //    dtCsv.Rows[rowNum]["CompressionType"] = compression.ToString();
                //            //else
                //            //{
                //            //    dtCsv.Rows[rowNum]["CompressionType"] = "UnRecognised";
                //            //}

                //            Console.WriteLine(string.Format("Counter = {0}, docnumber = {1} , path = {2},  CT = {3}", rowNum, docNumber, imgPath, compression));
                //        }
                //    }
                //}

                //Multi-tasking
                Parallel.ForEach(dtCsv.AsEnumerable(), drow =>
                {
                    var imgPath = drow["Path"].ToString();

                    if (!string.IsNullOrWhiteSpace(imgPath) && imgPath.LastIndexOf(".") != -1)
                    {
                        if (imgPath.Substring(imgPath.LastIndexOf(".")).ToUpper().Equals(".TIF"))
                        {
                            docNumber = drow["DOCNUMBER"].ToString();

                            var compression = GetCompressionTypeFromImage(imgPath);

                            drow["CompressionType"] = compression;

                            Console.WriteLine(string.Format("docnumber = {0} , path = {1},  CT = {2}", docNumber, imgPath, compression));
                        }
                    }
                });

                if (File.Exists(filename))
                    File.Delete(filename);

                //write CSV
                var tempTable = dtCsv.Copy();
                var headerRow = tempTable.NewRow();

                foreach (DataColumn col in dtCsv.Columns)
                    headerRow[col.ColumnName] = col.ColumnName;

                headerRow[headerRow.ItemArray.Length - 1] = "CompressionType";

                tempTable.Rows.InsertAt(headerRow, 0);

                CsvEngine.DataTableToCsv(tempTable, ConfigurationManager.AppSettings["filePath"].Trim());

                _stopwatch.Stop();

                Console.WriteLine(string.Format("Time elapsed in the process {0} minutes", _stopwatch.Elapsed.TotalMinutes.ToString("#.##")));

                Console.ReadLine();
            }
            catch (Exception exception)
            {
                Console.WriteLine(!string.IsNullOrWhiteSpace(docNumber)
                                      ? string.Format("Error in document No {0} and the error is {1} stack trace {2}",
                                                      docNumber, exception.Message, exception.StackTrace)
                                      : string.Format("Error is {0} stack trace {1}", exception.Message,
                                                      exception.StackTrace));
                Console.ReadLine();
            }
        }


        private static string GetCompressionTypeFromImage(string path)
        {
            string compression = "";
            try
            {
                lock (lockObject)
                {
                    using (var fs = new FileStream(path, FileMode.Open, FileAccess.Read))
                    {
                        using (Image sourceImage = Image.FromStream(fs))
                        {
                            var compressionTagIndex = Array.IndexOf(sourceImage.PropertyIdList, 0x103);
                            PropertyItem compressionTag = sourceImage.PropertyItems[compressionTagIndex];

                            var compressionType = (CompressionTypes)Enum.Parse(typeof(CompressionTypes),
                                                            BitConverter.ToInt16(compressionTag.Value, 0).ToString(CultureInfo.InvariantCulture));

                            if (Enum.IsDefined(typeof(CompressionTypes), compressionType))
                                compression = compressionType.ToString();
                            else
                            {
                                compression = "UnRecognised";
                            }
                        }
                    } 
                }
            }
            catch (Exception exFileStream)
            {
                compression = exFileStream.Message;
            }

            return compression;
            //using (var sourceImage = Image.FromFile(path))
            //{
            //    var compressionTagIndex = Array.IndexOf(sourceImage.PropertyIdList, 0x103);
            //    PropertyItem compressionTag = sourceImage.PropertyItems[compressionTagIndex];
            //    return (CompressionTypes)Enum.Parse(typeof(CompressionTypes), BitConverter.ToInt16(compressionTag.Value, 0).ToString(CultureInfo.InvariantCulture));
            //}
        }
    }

    public enum CompressionTypes
    {
        NoCompression = 1,
        CcittGroup3 = 2,
        FacsimilecompatibleCcittGroup3 = 3,
        CcittGroup4 = 4,
        Lzw = 5,
        UnRecognised = 6,
        ExceptionInFilehandling = 7
    }
}

Instead of reading the files over the network, can you run your program on the server that is hosting the files? 您可以在承载文件的服务器上运行程序,而不是通过网络读取文件吗?

If not, I would have one program copy files from the network to a local folder to act as a queue. 如果没有,我将有一个程序将文件从网络复制到本地文件夹以充当队列。 Then have a 2nd program read the image from the local queue folder, determine the compression, and then delete the file. 然后让第二个程序从本地队列文件夹中读取图像,确定压缩率,然后删除该文件。 This separates the network IO time from your file processing time. 这样可以将网络IO时间与文件处理时间分开。

These are couple things that come to mind: 这些是我想到的几件事:

  1. Parallel.For instead of for to go through the list. Parallel.For而不是for浏览列表。
  2. async/await in .net 4.5 or .net 4 with Async CTP installed. 安装了异步CTP的.net 4.5或.net 4中的async / await。 Topic is too extensive to go over it here. 主题太广泛,无法在此处进行讨论。 You can check out Async/await here. 您可以此处查看“ 异步/等待”
  3. TPL DataFlow can also help parallelize the process. TPL DataFlow还可以帮助并行化流程。

You aren't describing a problem or a question. 您不是在描述问题或问题。 This is not acceptable in this community. 在这个社区中这是不可接受的。 Try editing your question and be more precise on your problem and what you want to do. 尝试编辑您的问题,并更精确地解决您的问题以及您想做什么。

If your bottleneck is your CPU, try doing work in many threads at a time. 如果瓶颈是CPU,请尝试一次在多个线程中进行工作。

If your bottleneck is the file access, you can move your images in an SSD drive or in a Memory Drive and access them from there. 如果瓶颈是文件访问,则可以将图像移动到SSD驱动器或内存驱动器中,然后从那里访问它们。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM