是否可以提高这种类型选择的性能？

Question

Assuming I get some data like { :type => 'X', :some_other_key => 'foo' } on runtime and depending on some conditions I want to initialize the corresponding class for it. 假设我在运行时得到了诸如{ :type => 'X', :some_other_key => 'foo' } ，并且根据某些条件，我想为其初始化相应的类。 Our way to do this is like this. 我们这样做的方式是这样的。

TYPE_CLASSES = [
  TypeA,
  TypeB,
  TypeC,
  # ...
  TypeUnknown
]

TYPE_CLASSES.detect {|type| type.responsible_for?(data)}.new

We iterate over a list of classes and ask each one if it is responsible for the given data and initialize the first one found. 我们遍历一列类，并询问每个类是否对给定数据负责，并初始化找到的第一个类。

The order of the TYPE_CLASSES is important and some responsible_for? TYPE_CLASSES的顺序很重要，有些responsible_for? methods do not only check the type but also other keys inside of data . 方法不仅检查类型，还检查data其他键。 So some specialized class checking for type == 'B' && some_other_key == 'foo' has to come before a generalized class checking only for type == 'B' . 因此，必须在只检查type == 'B'的通用类之前进行一些专门的类型检查type == 'B' && some_other_key == 'foo' type == 'B' 。

This works fine and is easily extensible, but TYPE_CLASSES list is already quite long, so in the worst case finding out the right type could result in iterating until the last element and calling for each type the responsible_for? 这可以很好地工作并且易于扩展，但是TYPE_CLASSES列表已经很长了，因此在最坏的情况下，找到正确的类型可能会导致迭代直到最后一个元素，并为每种类型调用responsible_for? check. 校验。

Is there any way to improve the performance and avoid iterating over each element while still preserving the order of the checks? 有什么方法可以提高性能并避免在仍然保留检查顺序的同时对每个元素进行迭代？

Answer 1

If matching the data set to classes is as complex as you described it, it might make sense to use decision tree building algorithms ( example ). 如果将数据集与类匹配的过程如您所描述的那样复杂，那么使用决策树构建算法（例子）可能是有意义的。

You can use AI4R library to do that in Ruby. 您可以使用AI4R库在Ruby中执行此操作。

Probably you don't need to build that tree dynamically. 可能您不需要动态构建该树。 So you can just use the library to basically generate optimized detection strategy for you, example from the documentation: 因此，您可以使用该库为您基本上生成优化的检测策略，例如来自文档的示例：

DATA_LABELS = [ 'city', 'age_range', 'gender', 'marketing_target'  ]
DATA_SET = [  
   ['New York',  '<30',      'M',  'Y'],
         ['Chicago',   '<30',      'M',  'Y'],
         ['Chicago',   '<30',      'F',  'Y'],
         ['New York',  '<30',      'M',  'Y'],
         ['New York',  '<30',      'M',  'Y'],
         ['Chicago',   '[30-50)',  'M',  'Y'],
         ['New York',  '[30-50)',  'F',  'N'],
         ['Chicago',   '[30-50)',  'F',  'Y'],
         ['New York',  '[30-50)',  'F',  'N'],
         ['Chicago',   '[50-80]',  'M',  'N'],
         ['New York',  '[50-80]',  'F',  'N'],
         ['New York',  '[50-80]',  'M',  'N'],
         ['Chicago',   '[50-80]',  'M',  'N'],
         ['New York',  '[50-80]',  'F',  'N'],
         ['Chicago',   '>80',      'F',  'Y']
       ]
id3 = ID3.new(DATA_SET, DATA_LABELS)
id3.get_rules
# =>  if age_range=='<30' then marketing_target='Y'
  elsif age_range=='[30-50)' and city=='Chicago' then marketing_target='Y'
  elsif age_range=='[30-50)' and city=='New York' then marketing_target='N'
  elsif age_range=='[50-80]' then marketing_target='N'
  elsif age_range=='>80' then marketing_target='Y'
  else raise 'There was not enough information during training to do a proper induction for this data element' end

(So you basically will be able to take that last line insert it into your code.) （因此，您基本上可以将最后一行插入代码中。）

You need to choose enough already classified records to make DATA_SET and DATA_LABELS, and also you need to convert your hashes into arrays (which isn't that difficult – basically your hashes' keys are DATA_LABELS , and your hashes values are values of DATA_SET array). 您需要选择足够多的已分类记录以创建DATA_SET和DATA_LABELS，还需要将哈希转换为数组（这并不困难-基本上，哈希的键是DATA_LABELS ，哈希值是DATA_SET数组的值）。

When you add new TYPE_CLASS , just retry the 'teaching' and update your detection code. 当您添加新的TYPE_CLASS ，只需重试“教学”并更新您的检测代码。

是否可以提高这种类型选择的性能？

问题描述

1 个解决方案

解决方案1
1 2015-08-04 11:30:22

是否可以提高这种类型选择的性能？

问题描述

1 个解决方案

解决方案1 1 2015-08-04 11:30:22

解决方案1
1 2015-08-04 11:30:22