简体   繁体   中英

Why doesn't TF Boosted Trees accept numerical data as input?

For for tf.estimator.BoostedTreesClassifier , why do all feature columns required to be of type bucketsized or indicator column?

What is the best way to handle both the numerical, and categorical data that is used by the classifier?

It just seems impossible to work with numerical data. Decision trees are perfect since I don't even need to scale my data.

My code is as follows:

def _parse_record():
    # do something
    return {'feature_1': array[0], 'feature_2': array[190.98]}, label

def input_fn():
    # parse record
    return dataset

feature_cols = []
for _ in numerical_features:
    feature_cols.append(tf.feature_column.numeric_column(key=_))
for _ in cat:
    c = tf.feature_column.categorical_column_with_hash_bucket(key=_, hash_bucket_size=100)
    ind = tf.feature_column.indicator_column(c)
    feature_cols.append(ind)

classifier = tf.estimator.BoostedTreesClassifier(
    feature_columns=feature_cols,
    n_batches_per_layer=100,
    n_trees=100,
)

f=lambda: input_fn()
classifier.train(input_fn=f)

However, this gives me:

ValueError: For now, only bucketized_column and indicator column are supported but got: _NumericColumn(key='active_time', shape=(1,), default_value=None, dtype=tf.float32, normalizer_fn=None)

Support for numeric features in tf.estimator.BoostedTreesClassifier has just been added in TensorFlow v1.13 ( source , commit ). The first stable release is v1.13.1 .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM