Best Design Pattern for Large Data processing methods

Question

I have an application that I am refactoring and trying to Follow some of the "Clean Code" principles. I have an application that reads data from multiple different data sources and manipulates/formats that data and inserts it into another database. I have a data layer with the associated DTO's, repositories, interfaces , and helpers for each data source as well as a business layer with the matching entities, repositories and interfaces.

My question comes down to the Import Method. I basically have one method that systematically calls each Business logic method to read, process and save the data. There are a lot of calls that need to be made and even though the Import method itself is not manipulating the data at all, the method is still extremely large. Is there a better way to process this data?

ICustomer<Customer> sourceCustomerList = new CustomerRepository();
foreach (Customer customer in sourceCustomerList.GetAllCustomers())
{

   // Read Some Data
   DataObject object1 = iSourceDataType1.GetDataByCustomerID(customer.ID)
   // Format and save the Data
   iTargetDataType1.InsertDataType1(object1)

   // Read Some Data

   // Format the Data

   // Save the Data

   //...Rinse and repeat
}

Answer 1

You should look into Task Parallel Library (TPL) and Dataflow

ICustomer<Customer> sourceCustomerList = new CustomerRepository();

var customersBuffer = new BufferBlock<Customer>();
var transformBlock = new TransformBlock<Customer, DataObject>(
    customer => iSourceDataType1.GetDataByCustomerID(customer.ID)
);

// Build your block with TransformBlock, ActionBlock, many more... 
customersBuffer.LinkTo(transformBlock);

// Add all the blocks you need here....

// Then feed the first block or use a custom source
foreach (var c in sourceCustomerList.GetAllCustomers())
    customersBuffer.Post(c)
customersBuffer.Complete();

Answer 2

Your performance will be IO-bound, especially with the many accesses to the database(s) in each iteration. Therefore, you need to revise your architecture to minimise IO.

Is it possible to move all the records closer together (maybe in a temporary database) as a first pass, then do the record matching and formatting within the database as a second pass, before reading them out and saving them where they need to be?

(As a side note, sometimes we get carried away with DDD and OO, where everything "needs" to be an object. But that is not always the best approach.)

Best Design Pattern for Large Data processing methods

Question

2 answers

solution1
1 ACCPTED 2019-04-11 21:13:48

solution2
0 2019-04-11 21:32:57

Best Design Pattern for Large Data processing methods

Question

2 answers

solution1 1 ACCPTED 2019-04-11 21:13:48

solution2 0 2019-04-11 21:32:57

solution1
1 ACCPTED 2019-04-11 21:13:48

solution2
0 2019-04-11 21:32:57