应用没有tf.Estimator的功能列（Tensorflow 2.0.0-rc0）

Question

在Tensorflow tf.Estimator和tf.feature_column文档中，如何将特征列与Estimator结合使用，例如文档，以便对所使用的数据集中的分类特征进行单热编码，已有详细记录。

但是，我想将特征列直接“应用”到我从.csv文件（具有两列：UserID，MovieID）创建的tf.dataset中，甚至不需要定义模型或Estimator。 （原因：我想检查我的数据管道中到底发生了什么，即我希望能够通过管道运行一批样本，然后在输出中查看要素的编码方式。）

到目前为止，这是我尝试过的：

column_names = ['UserID', 'MovieID']

user_col = tf.feature_column.categorical_column_with_hash_bucket(key='UserID', hash_bucket_size=1000)
movie_col = tf.feature_column.categorical_column_with_hash_bucket(key='MovieID', hash_bucket_size=1000)
feature_columns = [tf.feature_column.indicator_column(user_col), tf.feature_column.indicator_column(movie_col)]

feature_layer = tf.keras.layers.DenseFeatures(feature_columns=feature_columns)

def process_csv(line):
  fields = tf.io.decode_csv(line, record_defaults=[tf.constant([], dtype=tf.int32)]*2, field_delim=";")
  features = dict(zip(column_names, fields))

  return features 

ds = tf.data.TextLineDataset(csv_filepath)
ds = ds.map(process_csv, num_parallel_calls=4)
ds = ds.batch(10)
ds.map(lambda x: feature_layer(x))

但是，map调用的最后一行引发以下错误：

ValueError：列dtype和SparseTensors dtype必须兼容。 键：MovieID，列dtype：，张量dtype：

我不确定这个错误是什么意思...我还尝试仅使用定义的feature_layer定义tf.keras模型，然后在数据集上运行.predict（）-而不是使用ds.map（lambda x： feature_layer（x））：

model = tf.keras.Sequential([feature_layer])
model.compile()
model.predict(ds)

但是，这将导致与上述完全相同的错误。 有人知道出什么问题了吗？ 有没有可能更简单的方法来实现这一目标？

Answer 1

刚发现问题：tf.feature_column.categorical_column_with_hash_bucket（）使用可选参数dtype，默认情况下将其设置为tf.dtypes.string。 但是，我的列的数据类型是数字（tf.dtypes.int32）。 这解决了问题：

tf.feature_column.categorical_column_with_hash_bucket(key='UserID', hash_bucket_size=1000, dtype=tf.dtypes.int32)

应用没有tf.Estimator的功能列（Tensorflow 2.0.0-rc0）

问题描述

1 个解决方案

解决方案1
0 已采纳 2019-09-06 01:49:37

应用没有tf.Estimator的功能列（Tensorflow 2.0.0-rc0）

问题描述

1 个解决方案

解决方案1 0 已采纳 2019-09-06 01:49:37

解决方案1
0 已采纳 2019-09-06 01:49:37