简体   繁体   English

节点正则表达式解析器中的内存泄漏?

[英]memory leak in node regex parser?

The following code causes node to consume a lot of ram and crash when it runs out of memory. 以下代码使节点消耗大量内存并在内存不足时崩溃。 However if I change the length of the found string from 13 to 12 everything is fine. 但是,如果我将找到的字符串的长度从13更改为12,一切都很好。 It looks as if strings returned by a regex search contain a hidden reference to the original string that was searched. 看起来正则表达式搜索返回的字符串似乎包含对搜索到的原始字符串的隐藏引用。 But only if the length of the found match is at least 13 characters. 但仅当找到的匹配项的长度至少为13个字符时。 Is this a bug or is there some good reason for that behavior? 这是一个错误还是该行为有充分的理由?

function randString(length) {
  var a = "a".charCodeAt(0),
      result = [];
  for(var i = 0; i < length; i++) {
    result.push(a + Math.floor(Math.random() * 26));
  }
  return String.fromCharCode.apply(null, result);
}


var arr = [];

for(var i = 0; i < 1000000; i++) {
  if(i % 1000 === 0) console.log(i);
  var str = randString(13);
  str = randString(5000) + "<" +str + ">" + randString(5000);
  var re = /<([a-z]+)>/gm;
  var next = re.exec(str);
  arr.push(next[1]);
}

I observe the same behavior in Chrome. 我在Chrome中观察到了相同的行为。 I think the two (node.js and Chrome) behave the same because they are based on the same Javascript engine ( V8 ). 我认为两者(node.js和Chrome)的行为相同,因为它们基于相同的Javascript引擎( V8 )。

There is no memory leak, but there is a problem with the garbage management in Javascript. 没有内存泄漏,但是Javascript中的垃圾管理存在问题。 I deduct this from the observation, that the Gbytes of memory are freed when I force garbage collection in Google Dev Tools. 我从观察结果中推断出这一点,即当我在Google Dev Tools中强制进行垃圾回收时会释放GB的内存。

You could force to run the garbage collector, as explained here . 您可以强制运行垃圾收集器,如此处所述 That way, your node.js will not crash. 这样,您的node.js不会崩溃。

Edit 编辑

Testing further I can tell these things: 进一步测试,我可以告诉这些事情:

About your comment But as long as there is still a reference to the array no memory get's freed. 关于您的注释但是,只要仍然有对该数组的引用,就不会释放任何内存。 :

It looks more complicated than that, but you are right, arr seems to occupy all that space 1.1 Go for 100'000 items, this is 10kB per item. 它看起来比这更复杂,但是您是对的, arr似乎占据了所有空间1.1放入100'000件物品,每件物品10kB。 When you look at the array next , it indeed has a size of roughly 10kB (10015 bytes for next.input . If all worked like expected, next[1] would be a simple string and use only slighly more than the 13 data bytes, but this is not the case. Referencing next[1] in the array arr does not allow next to be garbage collected. 当您查看next数组时,它的大小确实约为next.inputnext.input为10015字节。如果所有工作都如预期的那样, next[1]将是一个简单的字符串,并且仅使用13个以上的数据字节,但是情况并非如此,在数组arr中引用next[1]不允许对next进行垃圾回收。

As a solution, I came up with this modified code ( fiddle ): 作为解决方案,我想出了这个修改后的代码( fiddle ):

function randString(length) {
  var a = "a".charCodeAt(0),
      result = [];
  for(var i = 0; i < length; i++) {
    result.push(a + Math.floor(Math.random() * 26));
  }
  return String.fromCharCode.apply(null, result);
}


var arr = [];

for(var i = 0; i < 100000; i++) {
  if(i % 1000 === 0) console.log(i);
  var str = randString(13);
  str = randString(5000) + "<" +str + ">" + randString(5000);
  var re = /<([a-z]+)>/gm;
  var next = re.exec(str);
  arr.push(next[1].split('').join(''));
}
console.log(arr)

The trick is to cut the reference between next and the string stored in arr by splitting the string and joining it again. 诀窍是通过分割字符串并再次连接来在next和存储在arr的字符串之间切下引用。

I don't know anything about the internals, but it looks like a bug in V8. 我对内部知识一无所知,但看起来像是V8中的错误。 Testing the same on Firefox, everything works as expected, and there is no excessive memory usage. 在Firefox上进行相同的测试,一切正常,并且没有过多的内存使用。

I found the source of the problem. 我找到了问题的根源。 It's not the regexp parser that's responsible for this but the substring method on strings. 造成此问题的不是正则表达式解析器,而是字符串的substring方法。 It's intended as a feature for making the creation of substrings more efficient. 它旨在使子字符串的创建更加高效。 There is an open issue about this on the V8 bug report page. V8错误报告页面上对此有一个未解决的问题。 https://code.google.com/p/v8/issues/detail?id=2869 https://code.google.com/p/v8/issues/detail?id=2869

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM