为什么ruby scanf这么慢？

Question

我正在研究一些文本转换例程，这些例程在Ruby中解析不同格式的时间值。 这个例程越来越复杂，我正在测试一种更好的方法来解决这个问题。

我目前正在测试使用scanf 。 为什么？ 我一直以为这比正则表达式快，但是在Ruby中发生了什么？ 慢得多！

我究竟做错了什么？

注意：我正在使用ruby-1.9.2-p290 [x86_64]（MRI）

第一次Ruby测试：

require "scanf"
require 'benchmark'

def duration_in_seconds_regex(duration)
  if duration =~ /^\d{2,}\:\d{2}:\d{2}$/
    h, m, s = duration.split(":").map{ |n| n.to_i }
    h * 3600 + m * 60 + s
  end
end

def duration_in_seconds_scanf(duration)
  a = duration.scanf("%d:%d:%d")
  a[0] * 3600 + a[1] * 60 + a[2]
end

n = 500000
Benchmark.bm do |x|
  x.report { for i in 1..n; duration_in_seconds_scanf("00:10:30"); end }
end

Benchmark.bm do |x|
  x.report { for i in 1..n; duration_in_seconds_regex("00:10:30"); end }
end

这是我首先使用scanf然后使用正则表达式的结果：

      user     system      total        real
  95.020000   0.280000  95.300000 ( 96.364077)
       user     system      total        real
   2.820000   0.000000   2.820000 (  2.835170)

使用C进行第二次测试：

#include <stdio.h>
#include <stdlib.h>
#include <time.h>
#include <sys/types.h>
#include <string.h>
#include <regex.h>

char *regexp(char *string, char *patrn, int *begin, int *end) {
    int i, w = 0, len;
    char *word = NULL;
    regex_t rgT;
    regmatch_t match;
    regcomp(&rgT, patrn, REG_EXTENDED);
    if ((regexec(&rgT, string, 1, &match, 0)) == 0) {
        *begin = (int) match.rm_so;
        *end = (int) match.rm_eo;
        len = *end - *begin;
        word = malloc(len + 1);
        for (i = *begin; i<*end; i++) {
            word[w] = string[i];
            w++;
        }
        word[w] = 0;
    }
    regfree(&rgT);
    return word;
}

int main(int argc, char** argv) {
    char * str = "00:01:30";
    int h, m, s;
    int i, b, e;
    float start_time, end_time, time_elapsed;
    regex_t regex;
    regmatch_t * pmatch;
    char msgbuf[100];
    char *pch;
    char *str2;
    char delims[] = ":";
    char *result = NULL;

    start_time = (float) clock() / CLOCKS_PER_SEC;
    for (i = 0; i < 500000; i++) {
        if (sscanf(str, "%d:%d:%d", &h, &m, &s) == 3) {
            s = h * 3600L + m * 60L + s;
        }
    }
    end_time = (float) clock() / CLOCKS_PER_SEC;
    time_elapsed = end_time - start_time;
    printf("sscanf_time (500k iterations): %.4f", time_elapsed);

    start_time = (float) clock() / CLOCKS_PER_SEC;
    for (i = 0; i < 500000; i++) {
        char * match = regexp(str, "[0-9]{2,}:[0-9]{2}:[0-9]{2}", &b, &e);
        if (strcmp(match, str) == 0) {
            str2 = (char*) malloc(sizeof (str));
            strcpy(str2, str);
            h = strtok(str2, delims);
            m = strtok(NULL, delims);
            s = strtok(NULL, delims);
            s = h * 3600L + m * 60L + s;
        }
    }
    end_time = (float) clock() / CLOCKS_PER_SEC;
    time_elapsed = end_time - start_time;
    printf("\n\nregex_time (500k iterations): %.4f", time_elapsed);

    return (EXIT_SUCCESS);
}

C代码结果明显更快，而正则表达式结果比scanf结果慢，如预期的那样：

sscanf_time (500k iterations): 0.1774

regex_time (500k iterations): 3.9692

很明显，C的运行时间更快，所以请不要评论Ruby是被解释的，而是类似的东西。

这是相关要点。

Answer 1

问题不在于它被解释，而是Ruby中的所有东西都是一个对象。 您可以在Ruby发行版中浏览“ scanf.rb”，并将其与C语言中的scanf实现进行比较。

基于RegExp匹配的scanf的Ruby实现。 每个像“％d”这样的原子都是红宝石中的对象，而在C中它只是一个case项。因此，在我看来，执行时间之所以如此，是因为有很多对象分配/重新分配。

Answer 2

假设是MRI：scanf显然是10年前用Ruby（scanf.rb）编写的，从那以后再也没有碰过（看起来确实很复杂！）。 split ， map和regexes在高度优化的C中实现。

为什么ruby scanf这么慢？

问题描述

2 个解决方案

解决方案1
4 已采纳 2012-03-01 20:29:00

解决方案2
2 2012-03-01 20:52:48

为什么ruby scanf这么慢？

问题描述

2 个解决方案

解决方案1 4 已采纳 2012-03-01 20:29:00

解决方案2 2 2012-03-01 20:52:48

解决方案1
4 已采纳 2012-03-01 20:29:00

解决方案2
2 2012-03-01 20:52:48