简体   繁体   English

比较csv中的记录和大列表

[英]Comparing a record from csv with large list

In my website user uploads a csv file. 用户在我的网站上上传一个csv文件。

I am reading a csv file using this library http://www.codeproject.com/Articles/11698/A-Portable-and-Efficient-Generic-Parser-for-Flat-F The csv file will have around 4000 records (each record with 5 columns). 我正在使用此库http://www.codeproject.com/Articles/11698/A-Portable-and-Efficient-Generic-Parser-for-Flat-F阅读csv文件csv文件将包含约4000条记录(每条记录记录5列)。

I am reading each record in to a List and search in a large list of objects(Before started reading csv file I am reading the large list of objects from a service to the cache.) to check whether this record already exist or not. 我正在将每个记录读入一个列表并搜索一个大对象列表(在开始读取csv文件之前,我正在从服务到缓存中读取一个大对象列表。)以检查该记录是否已经存在。

In this way I have to do 4000 iterations and in each iteration I have to search in large list of objects ( around 100 thousands records which are in cache). 这样,我必须进行4000次迭代,并且每次迭代都必须在大型对象列表中进行搜索(缓存中约有10万条记录)。

Is this the good way of implementation? 这是实现的好方法吗? Is there any way to improve the speed? 有什么办法可以提高速度? Is it good idea to store such a large list in cache? 将这么大的列表存储在缓存中是个好主意吗?

My environment is VS2010, .NET4.0, 我的环境是VS2010,.NET4.0,

You can speed up your search by using an appropriate data structure for your lists. 您可以通过为列表使用适当的数据结构来加快搜索速度。 If the items have a unique/primary key you could use a hashmap which would be more efficient than iterating the whole list for each item. 如果项目具有唯一/主键,则可以使用哈希表,该哈希表比为每个项目迭代整个列表更有效。 This way you could use hashmap.containskey(). 这样,您可以使用hashmap.containskey()。

if you run the service you could push the responsibility up to the service - perhaps pushing the list of unique keys there for comparison. 如果您运行该服务,则可以将责任推到该服务上-也许将唯一键列表推向那里进行比较。

maybe you could post some code for a more specific answer. 也许您可以发布一些代码以获得更具体的答案。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM