简体繁体 English

在JavaScript中处理大型（12K +行）数组

[英]Processing a large (12K+ rows) array in JavaScript

原文 2012-05-07 16:06:30 2 3 javascript/ json/ indexeddb/ web-sql

The project requirements are odd for this one, but I'm looking to get some insight... 这个项目的要求很奇怪，但我希望得到一些见解......

I have a CSV file with about 12,000 rows of data, approximately 12-15 columns. 我有一个包含大约12,000行数据的CSV文件，大约12-15列。 I'm converting that to a JSON array and loading it via JSONP (has to run client-side). 我正在将其转换为JSON数组并通过JSONP加载它（必须运行客户端）。 It takes many seconds to do any kind of querying on the data set to returned a smaller, filtered data set. 在数据集上进行任何类型的查询都需要几秒钟才能返回较小的过滤数据集。 I'm currently using JLINQ to do the filtering, but I'm essentially just looping through the array and returning a smaller set based on conditions. 我目前正在使用JLINQ进行过滤，但我实际上只是循环遍历数组并根据条件返回一个较小的集合。

Would webdb or indexeddb allow me to do this filtering significantly faster? webdb或indexeddb是否允许我以更快的速度进行此过滤？ Any tutorials/articles out there that you know of that tackles this particular type of issue? 你知道的那些教程/文章解决了这个特殊类型的问题吗？

3 个解决方案

http://square.github.com/crossfilter/ (no longer maintained, see https://github.com/crossfilter/crossfilter for a newer fork.) http://square.github.com/crossfilter/ （不再维护，请参阅https://github.com/crossfilter/crossfilter获取更新的分支。）

Crossfilter is a JavaScript library for exploring large multivariate datasets in the browser. Crossfilter是一个JavaScript库，用于在浏览器中探索大型多变量数据集。 Crossfilter supports extremely fast (<30ms) interaction with coordinated views, even with datasets containing a million or more records... Crossfilter支持与协调视图的极快（<30ms）交互，即使数据集包含一百万或更多记录......

This reminds me of an article John Resig wrote about dictionary lookups (a real dictionary, not a programming construct). 这让我想起John Resig写的一篇关于字典查找的文章（真正的字典，而不是编程结构）。

http://ejohn.org/blog/dictionary-lookups-in-javascript/ http://ejohn.org/blog/dictionary-lookups-in-javascript/

He starts with server side implementations, and then works on a client side solution. 他从服务器端实现开始，然后在客户端解决方案上工作。 It should give you some ideas for ways to improve what you are doing right now: 它应该为您提供一些改进现在正在改进的方法的想法：

Caching 高速缓存
Local Storage 本地存储
Memory Considerations 内存注意事项

If you require loading an entire data object into memory before you apply some transform on it, I would leave IndexedDB and WebSQL out of the mix as they typically both add to complexity and reduce the performance of apps. 如果您需要在对其应用某些转换之前将整个数据对象加载到内存中，我会将IndexedDB和WebSQL排除在混合之外，因为它们通常会增加复杂性并降低应用程序的性能。

For this type of filtering, a library like Crossfilter will go a long way. 对于这种类型的过滤，像Crossfilter这样的库将会有很长的路要走。

Where IndexedDB and WebSQL can come into play in terms of filtering is when you don't need to load, or don't want to load, an entire dataset into memory. IndexedDB和WebSQL可以在过滤方面发挥作用，当您不需要加载或不想将整个数据集加载到内存中时。 These databases are best utilized for their ability to index rows (WebSQL) and attributes (IndexedDB). 这些数据库最适合用于索引行（WebSQL）和属性（IndexedDB）的能力。

With in browser databases, you can stream data into a database one record at a time and then cursor through it, one record at a time. 在浏览器数据库中，您可以一次将数据流式传输到数据库中一条记录，然后光标通过它，一次一条记录。 The benefit here for filtering is that this you means can leave your data on "disk" (a .leveldb in Chrome and .sqlite database for FF) and filter out unnecessary records either as a pre-filter step or filter in itself. 过滤的好处是，这意味着可以将数据保留在“磁盘”（Chrome中的.leveldb和FF的.sqlite数据库）中，并将不必要的记录过滤掉，作为预过滤步骤或过滤器本身。