简体   繁体   English

如何处理 Ruby/Rails 中的 memory 泄漏

[英]How to deal with memory leak in Ruby/Rails

I'm developping a Rails application that deals with huge amounts of data and it halts since it uses all memory of my computer due to memory leak (allocated objects that are not released).我正在开发一个处理大量数据的 Rails 应用程序,由于 memory 泄漏(未释放的已分配对象),它使用了我计算机的所有 memory,因此它停止了。

In my application, data is organized in a hierarchical way, as a tree, where each node of level "X" contains the sum of data of level "X+1".在我的应用程序中,数据以分层方式组织为一棵树,其中“X”级的每个节点都包含“X+1”级数据的总和。 For example if the data of level "X+1" contains the amount of people in cities, level "X" contains the amount of people in states.例如,如果级别“X+1”的数据包含城市的人口数量,则级别“X”的数据包含州的人口数量。 In this way, level "X"'s data is obtained by summing up the amount of data in level "X+1" (in this case, people).这样,“X”级的数据就是“X+1”级(这里是人)的数据量相加得到的。

For the sake of this question, consider a tree with four levels: country, State, City and Neighbourhoods and that each level is mapped into Activerecords tables (countries, states, cities, neighbourhoods).对于这个问题,考虑一个具有四个级别的树:国家、State、城市和社区,并且每个级别都映射到 Activerecords 表中(国家、州、城市、社区)。

Data is read from a csv file that fills the leaves of the tree, that is, the neighbourhoods table.数据从 csv 文件中读取,该文件填充了树的叶子,即邻域表。

Afetr that, data flows from bottom (neighbourhoods) to top (countries) in the following sequence:此后,数据按以下顺序从底部(社区)流向顶部(国家):

1) Neighbourhoods data is summed to Cities;
2) after step 1 is completed, Cities  data is summed to States;
3) after step 2 is completed, States  data is summed to Country;

The schematic code I'm using is as follows:我使用的原理图代码如下:

1 cities = City.all
2 cities.each do |city|
3   city.data = 0
4   city.neighbourhoods.each do |neighbourhood|
5       city.data = city.data + neighbourhood.data
6   end
7   city.save
8 end

The lowest level of the tree contains 3.8M of records.树的最低级别包含 380 万条记录。 Each time lines 2-8 are executed, a city is summed up and after line 8 is executed, that subtree is no longer needed, but it is never released (memory leak).每次执行第 2-8 行时,都会汇总一个城市,执行第 8 行后,不再需要该子树,但永远不会释放它(内存泄漏)。 After summing 50% of the cities, all my 8Gbytes of RAM vanishes.加总 50% 的城市后,我所有的 8GB RAM 都消失了。

My question is what can I do.我的问题是我能做什么。 Buy better hardware will not do since I'm working with a "small" prototype.由于我正在使用“小型”原型,因此无法购买更好的硬件。

I know a way to make it work: restart the application for each City, but I hope someone has a better idea.我知道一种使它起作用的方法:为每个城市重新启动应用程序,但我希望有人有更好的主意。 The "simplest" would be to force the garbage collector to free specific objects, but seems is not a way to do it ( https://www.ruby-forum.com/t/how-do-i-force-ruby-to-release-memory/195515 ). “最简单”的方法是强制垃圾收集器释放特定对象,但似乎不是一种方法( https://www.ruby-forum.com/t/how-do-i-force-ruby-释放内存/195515 )。

From the following articles I understood that the developer should organize the data in a way to "suggest" the garbage collector what should be freed.从以下文章中,我了解到开发人员应该以一种“建议”垃圾收集器应该释放什么的方式来组织数据。 Maybe another approach will do the trick, but the only alternative I see is Depth-first search approach instead of the reversed Breadth-first search I'm using, but I don't see why it should work.也许另一种方法可以解决问题,但我看到的唯一替代方法是深度优先搜索方法,而不是我正在使用的反向广度优先搜索,但我不明白为什么它应该起作用。

What I read so far:到目前为止我读到的:

https://stackify.com/how-does-ruby-garbage-collection-work-a-simple-tutorial/ https://stackify.com/how-does-ruby-garbage-collection-work-a-simple-tutorial/

https://www.toptal.com/ruby/hunting-ruby-memory-issues https://www.toptal.com/ruby/hunting-ruby-memory-issues

https://scoutapm.com/blog/ruby-garbage-collection https://scoutapm.com/blog/ruby-garbage-collection

https://scoutapm.com/blog/manage-ruby-memory-usage https://scoutapm.com/blog/manage-ruby-memory-usage

Thanks谢谢

This isn't really a case of a memory leak.这并不是 memory 泄漏的情况。 You're just indescrimely loading data off the table which will exhaust the available memory.您只是不加描述地从表中加载数据,这将耗尽可用的 memory。

The solution is to load the data off the database in batches :解决方案是批量加载数据库中的数据:

City.find_each do |city|
  city.update(data: city.neighbourhoods.sum(&:data))
end

If neighbourhoods.data is a simple integer you don't need to fetch the records in the first place:如果neighbourhoods.data是一个简单的 integer 你不需要首先获取记录:

City.update_all(
  'data = (SELECT SUM(neighbourhoods.data) FROM neighbourhoods WHERE neighbourhoods.city_id = cities.id)'
)

This will be an order of magnitude faster and have a trivial memory consumption as all the work is done in the database.这将快一个数量级,并且 memory 消耗微不足道,因为所有工作都在数据库中完成。

If you REALLY want to load a bunch of records into rails then make sure to select aggregates instead of instantiating all those nested records:如果您真的想将一堆记录加载到 rails 中,请确保使用 select 聚合而不是实例化所有这些嵌套记录:

City.left_joins(:neighbourhoods)
    .group(:id)
    .select(:id, 'SUM(neighbourhoods.data) AS n_data')
    .find_each { |c| city.update(data: n_data) }

You don't need rails at all, with pure SQL should be good enough to do what you're trying:您根本不需要导轨,纯 SQL 应该足以满足您的要求:

City.connection.execute(<<-SQL.squish)
  UPDATE cities SET cities.data = (
    SELECT SUM("neighbourhoods.data")
    FROM neighbourhoods
    WHERE neighbourhoods.city_id = cities.id
  ) 
  SQL

Depending on your how your model associations are setup, should be able to take advantage of preloading.根据您的 model 关联的设置方式,应该能够利用预加载。

For Example:例如:

class City < ApplicationRecord
  has_many :neighborhoods

class Neighborhood < ApplicationRecord
  belongs_to :city
  belongs_to :state

class State < ApplicationRecord
  belongs_to :country
  has_many :neighborhoods

class Country < ApplicationRecord
  has_many :states


cities = City.all.includes(neighborhoods: { state: :country })
cities.each do |city|
  ...
end

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM