简体   繁体   中英

“Shear” sort list of jobs for multiple threads efficiently

I have got a List which contains several Mesh -Objects. This is how such a mesh-class looks like:

public class Mesh
{
    public int GridWidth { get; }
    public int GridHeight { get; }
    public List<File> Files { get; }
    /* ... */
}

The List of files inside a mesh object contains File -Objects that mostly consists of a string with the filesystems-path to the file and a two dimensional array which will hold the content of the file after it got parsed.

public class File
{
    public string Path { get; }
    public double[][] Matrix { get; set; }
    /* ... */
}

Multithreading and parsing works fine. I have decided to launch as many threads as my CPU has single cores. In my case: 4.

With the help of Linq I concentrate all file-object in an own List at first:

List<File> allFiles = meshes.SelectMany(mesh => mesh.Files).ToList();

After that each Thread gets 1/4 of the Objects from this list and starts parsing the files.

And this is my problem : Files of the same size are located inside the same mesh ( GridWidth * GridHeight = Number of parsed matrix-cells). At this point it could happen by chance that one thread gets only files that have got a big size while another thread gets only files of low sizes. In this case one thread would finish earlier than the other thread(s) - and I don't want that because that would be inefficient.

So I had the idea to sort the list of meshes according to their size first and after that adding their files in orientation to the Shear Sort Method (or Snake Sort ) to a new List for each thread. The following algorithm works. But I think that their could be some room of improvement.

And these are my questions : Is this algorithm already efficient enough or does exist a better way for providing lists of files to each thread? If there isn't a better way I would be interested in a "smarter" way of coding (the for-loop seems a little bit complex with all its if/else and modulo operations).

int cores = 4;
List<File>[] filesOfThreads = new List<Slice>[cores];

List<File> allFilesDesc = meshes.OrderByDescending(mesh => mesh.GridWidth * mesh.GridHeight).SelectMany(mesh => mesh.Files).ToList();

int threadIndex = 0;
/*
 * Inside this for-loop the threadIndex changes
 * with each cycle in this way (in case of 4 cores):
 * 0->1->2->3->3->2->1->0->0->1->2->3->3->2 ...
 * With each cycle a file of the current position of 
 * allFilesDesc[i] is added to the list of
 * filesOfThreads[threadIndex]. In this "shear" sort
 * way every thread should get approximately the same
 * number of big and small files.
 */
for (int i = 0; i < allFilesDesc.Count; i++)
{
    if (i < cores)
    {
        filesOfThreads[threadIndex] = new List<File>();
    }
    filesOfThreads[threadIndex].Add(allFilesDesc[i]);
    if (i < cores - 1)
    {
        threadIndex++;
    }
    else if ((i + 1) % cores != 0)
    {
        threadIndex += ((i + 1) / cores) % 2 == 0 ? 1 : -1;
    }
}

foreach (var files in filesOfThreads)
{
    Thread thread = new Thread(() => ComputeFiles(files));
    thread.Start();
}

My suggest

/// <summary>
/// Helper methods for the lists.
/// </summary>
public static class ListExtensions
{
public static List<List<T>> ChunkBy<T>(this List<T> source, int chunkSize) 
{
    return source
        .Select((x, i) => new { Index = i, Value = x })
        .GroupBy(x => x.Index / chunkSize)
        .Select(x => x.Select(v => v.Value).ToList())
        .ToList();
}
}

For example, if you chuck the list of 18 items by 5 items per chunk, it gives you the list of 4 sublists with the following items inside: 5-5-5-3.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM