简体   繁体   English

通过c#处理文件的最佳实践

[英]Best Practices for working with files via c#

Application I work on generates several hundreds of files (csv) in a 15 minutes period of times. 我工作的应用程序在15分钟的时间内生成数百个文件(csv)。 and the back end of the application takes these files and process them (updates database with those values). 并且应用程序的后端获取这些文件并对其进行处理(使用这些值更新数据库)。 One problem is database locks. 一个问题是数据库锁。

What are the best practices on working with several thousands of files to avoid locking and efficiently processing these files? 使用数千个文件以避免锁定和有效处理这些文件的最佳做法是什么?

Would it be more efficient to create a single file and process it? 创建单个文件并处理它会更有效吗? or process single file at a time? 或一次处理单个文件?

What are some common best practices? 有哪些常见的最佳做法?

Edit: the database is not a relational dbms. 编辑:数据库不是关系型dbms。 It s nosql, object oriented dbms that works in the memory. 它是nosql,面向对象的dbms在内存中工作。

So, assuming that you have N-Machines creating files and each file is similar in the sense that it generally gets consumed into the same tables in the database... 因此,假设您有N-Machines创建文件,并且每个文件在某种意义上是相似的,因为它通常被消耗到数据库中的相同表中...

I'd set up a Queue, have all of the machines write their files to the queue and then have something on the other side picking stuff off of the queue and then processing it into the database. 我设置了一个队列,让所有的机器将他们的文件写入队列,然后在另一侧有东西从队列中挑选东西,然后将其处理到数据库中。 So, one file at a time. 所以,一次一个文件。 You could probably even optimize out the file operations by writing to the Queue directly. 您甚至可以通过直接写入队列来优化文件操作。

If you are experiencing problems with locks, it's likely the database tables being updated do not have proper indexes on them. 如果遇到锁定问题,则更新的数据库表可能没有适当的索引。 Get the SQL code that does the updating and find out what the execution plan is for it; 获取执行更新的SQL代码,并找出执行计划的内容; if you are using MSSQL, you can do this in SSMS; 如果您使用的是MSSQL,则可以在SSMS中执行此操作; if the UPDATE is causing a table scan, you need to add an index that will help isolate the records being updated (unless you are updating every single record in the table; that could be a problem). 如果UPDATE导致表扫描,则需要添加一个索引来帮助隔离正在更新的记录(除非您更新表中的每个记录;这可能是个问题)。

With limited knowledge of your exact scenario... 对您的具体情况知之甚少......

Performance wise, closing the file is possibly the most expensive operation you would be performing in terms of time, so my advice would be if you can go the single file route - then that would be the most performant approach. 性能方面,关闭文件可能是您在时间方面执行的最昂贵的操作,所以我的建议是如果您可以使用单文件路径 - 那么这将是最高性能的方法。

Lock will protect the files from processing until the first one is finished. 锁定将保护文件不被处理,直到第一个文件完成。

class ThreadSafe
{
  static readonly object _locker = new object();
  static int _val1, _val2;

  static void Go()
  {
    lock (_locker)
    {
      if (_val2 != 0) Console.WriteLine (_val1 / _val2);
      _val2 = 0;
    }
  }
}

Sounds like you'll either want a single file mechanism, or have all of the files consumed out of a shared single directory that continuously checks for the oldest csv file and runs it through your code. 听起来你要么想要一个文件机制,要么让所有文件都从共享的单个目录中消耗掉,这个目录不断地检查最旧的csv文件并通过你的代码运行它。 That might be the "cheapest" solution, anyway. 无论如何,这可能是“最便宜”的解决方案。 If you are actually generating more files that you can process, then I'd probably rethink the overall system architecture instead of the 'band-aid' approach. 如果您实际上生成了可以处理的更多文件,那么我可能会重新考虑整个系统架构而不是“创可贴”方法。

You may try to take care of concurrency issues at level of your app code and force dbms not to lock objects during updates. 您可以尝试在应用程序代码级别处理并发问题,并强制dbms在更新期间不要锁定对象。

(In RDBMS you would set the lowest transaction isolation level possible (read uncommitted)) (在RDBMS中,您可以设置最低的事务隔离级别(读取未提交))

Provided you can do that, another option is to truncate all old objects and bulk-insert new values. 如果您可以这样做,另一个选项是截断所有旧对象并批量插入新值。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM