简体   繁体   English

线程锁内的多线程

[英]Multithreading within a thread lock

I am working on speeding up the execution of some processes that publish a bulk set of records (Mostly in the millions) to Elasticsearch. 我正在努力加快某些过程的执行速度,这些过程将大量记录(大多数是数百万个)发布到Elasticsearch。 In my C# code I have already implemented a multi-threaded solution using Dataflow as scaffolded below: 在我的C#代码中,我已经使用Dataflow实现了一个多线程解决方案,如下所示:

var fetchRecords = new TransformBlock<?, ?>(() => { ... });
var sendRecordsToElastic = new ActionBlock<List<?>>(records => sendBulkRequest(records));

fetchRecords.LinkTo(sendRecordsToElastic, { PropogateCompletion = true });

fetchRecords.Post("Start");

And then the send bulk request call I want to implement: 然后我要实现的发送批量请求调用:

public IBulkResponse sendBulkRequest(List<?> records)
{
    lock(SomeStaticObject)
    {
       // Execute several new threads to send records in bulk
    }
}

My question for you is on the practicality for executing additional threads within a lock that exists as part of a Dataflow pipeline. 我对的问题是关于实用性的存在作为一个数据流管道的一部分的锁内执行额外的线程。

Is this ok? 这个可以吗? Could I see any potential hiccups in performance, execution, cache/memory misses, etc? 我可以看到性能,执行,缓存/内存丢失等方面的潜在故障吗?

Any insight would be gladly accepted. 任何见识都会很高兴地被接受。

You may want to use BulkAll here, which implements the observable pattern to make concurrent bulk requests to Elasticsearch. 您可能要在这里使用BulkAll ,该方法实现了可观察模式,以向Elasticsearch发出并发批量请求。 Here's an example 这是一个例子

void Main()
{   
    var pool = new SingleNodeConnectionPool(new Uri("http://localhost:9200"));
    var connectionSettings = new ConnectionSettings(pool);

    var client = new ElasticClient(connectionSettings);
    var indexName = "bulk-index";

    if (client.IndexExists(indexName).Exists)
        client.DeleteIndex(indexName);

    client.CreateIndex(indexName, c => c
        .Settings(s => s
            .NumberOfShards(3)
            .NumberOfReplicas(0)
        )
        .Mappings(m => m
            .Map<DeviceStatus>(p => p.AutoMap())
        )
    );

    var size = 500;

    // set up the observable
    var bulkAllObservable = client.BulkAll(GetDeviceStatus(), b => b
        .Index(indexName)
        .MaxDegreeOfParallelism(4)
        .RefreshOnCompleted()
        .Size(size)
    );

    var countdownEvent = new CountdownEvent(1);

    Exception exception = null;

    // set up an observer. Delegates passed are:
    // 1. onNext
    // 2. onError
    // 3. onCompleted
    var bulkAllObserver = new BulkAllObserver(
        response => Console.WriteLine($"Indexed {response.Page * size} with {response.Retries} retries"),
        ex => 
        {
            // capture exception for throwing outside Observer.
            // You may decide to do something different here
            exception = ex;
            countdownEvent.Signal();
        },
        () => 
        {
            Console.WriteLine("Finished");
            countdownEvent.Signal();
        });

    // subscribe to the observable          
    bulkAllObservable.Subscribe(bulkAllObserver);

    // wait indefinitely for it to finish. May want to put a
    // max timeout on this  
    countdownEvent.Wait();

    if (exception != null) 
    {
        throw exception;
    }
}

// lazily enumerated collection
private static IEnumerable<DeviceStatus> GetDeviceStatus()
{
    for (var i = 0; i < DocumentCount; i++)
        yield return new DeviceStatus(i); 
}

private const int DocumentCount = 20000;

public class DeviceStatus
{
    public DeviceStatus(int id) => Id = id;
    public int Id {get;set;}
}

If you don't need to do anything special in the observer, you can use the .Wait() method on the observable 如果您不需要在观察者中做任何特别的事情,可以在可观察对象上使用.Wait()方法。

var bulkAllObservable = client.BulkAll(GetDeviceStatus(), b => b
    .Index(indexName)
    .MaxDegreeOfParallelism(4)
    .RefreshOnCompleted()
    .Size(size)
)
.Wait(
    TimeSpan.FromHours(1), 
    response => Console.WriteLine($"Indexed {response.Page * size} with {response.Retries} retries")
);

There are observable methods for BulkAll , ScrollAll and Reindex (although there is ReindexOnServer which reindexes within Elasticsearch and maps to the Reindex API - the Reindex method predates this) 有可观察的方法BulkAllScrollAllReindex (虽然有ReindexOnServer内Elasticsearch和地图其重新索引到重新索引API -的Reindex方法早此)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM