简体   繁体   English

Ruby on Rails-存储和访问大数据集

[英]Ruby on Rails - Storing and accessing large data sets

I am having a hard time managing the storage and access of a large dataset within a Ruby on Rails application. 我很难在Ruby on Rails应用程序中管理大型数据集的存储和访问。 Here is my application in a nutshell: I am performing Dijkstra's algorithm as it pertains to a road network, and then displaying the nodes that it visits using the google maps API. 简而言之,这是我的应用程序:我正在执行Dijkstra的算法,因为它与公路网有关,然后使用google maps API显示其访问的节点。 I am using an open dataset of the US road network to construct the graph by iterating over two txt files given in the link, but I am having trouble storing this data in my app. 我正在使用美国公路网的开放数据集,通过对链接中给定的两个txt文件进行迭代来构造图形,但是我无法将这些数据存储在我的应用中。

I am under the impression that a large dataset like this not an ActiveRecord object - I don't need to modify the contents of this data, rather be able to access it and cache it locally in a hash to perform ruby methods on it. 我的印象是,像这样的大型数据集不是ActiveRecord对象-我不需要修改此数据的内容,而是能够访问它并将其本地缓存在散列中以对它执行ruby方法。 I have tried a few things but I am running into trouble. 我已经尝试了一些方法,但是遇到了麻烦。

  1. I figured that it would make most sense to parse the txt files and store the graph in yml format. 我认为解析txt文件并以yml格式存储图形是最有意义的。 I would then be able to load the graph into a DB as seed data, and grab the graph using Node.all, or something along those lines. 然后,我就可以将图作为种子数据加载到数据库中,并使用Node.all或类似的东西来获取图。 Unfortunately, the yml file becomes too large for rails to handle. 不幸的是,yml文件变得太大,rails无法处理。 Running a Rake causes the system to run at 100% for infinity... 运行Rake会使系统无限运行100%...

  2. Next I figured, well since I don't need to modify the data, I can just create the graph every time the application loads as start of its "initialization." 接下来,我发现,由于不需要修改数据,因此每次应用程序加载时,只要将其“初始化”即可创建图形。 But I don't exactly know where to put this code, I need to run some methods, or at least a block of data. 但是我不完全知道将这段代码放在哪里,我需要运行一些方法或至少一个数据块。 And then store it in some sort of global/session variable that I can access in all controllers/methods. 然后将其存储在我可以在所有控制器/方法中访问的某种全局/会话变量中。 I don't want to be passing this large dataset around, just have access to it from anywhere. 我不想传递这个庞大的数据集,只是可以从任何地方访问它。

  3. This is the way I am currently doing it, but it is just not acceptable. 这是我目前正在这样做的方式,但这是不可接受的。 I am parsing the text files that creates the graph on a controller action, and hoping that it gets computing before the server times out. 我正在解析在控制器操作上创建图形的文本文件,并希望它在服务器超时之前得到计算。

Ideally, I would store the graph in a database that I could grab the entire contents to use locally. 理想情况下,我会将图形存储在数据库中,这样我就可以获取全部内容以供本地使用。 Or at least only require the parsing of the data once as the application loads and then I would be able to access it from different page views, etc.. I feel like this would be the most efficient, but I am running into hurdles at the moment. 或者至少在应用程序加载时仅需要对数据进行一次解析,然后我就可以从不同的页面视图等访问数据。我感觉这将是最有效的,但是我遇到了障碍时刻。

Any ideas? 有任何想法吗?

You're on the right path. 您在正确的道路上。 There are a couple of ways to do this. 有两种方法可以做到这一点。 One is, in your model class, outside of any method , set up constants like these examples: 一种是在模型类中的任何方法之外,像下面的示例一样设置常量:

MY_MAP = Hash[ActiveRecord::Base.connection.select_all('SELECT thingone, thingtwo from table').map{|one| [one['thingone'], one['thingtwo']]}]
RAW_DATA = `cat the_file`  # However you read and parse your file
CA = State.find_by_name 'California'
NY = State.find_by_name 'New York'

These will get executed once in a production app: when the model's class is loaded. 这些将在生产应用程序中执行一次:加载模型的类时。 Another option: do this initialization in an initializer or other config file. 另一种选择:在初始化程序或其他配置文件中进行初始化。 See the config/initializers directory. 请参阅config / initializers目录。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM