應用沒有tf.Estimator的功能列（Tensorflow 2.0.0-rc0）

Question

在Tensorflow tf.Estimator和tf.feature_column文檔中，如何將特征列與Estimator結合使用，例如文檔，以便對所使用的數據集中的分類特征進行單熱編碼，已有詳細記錄。

但是，我想將特征列直接“應用”到我從.csv文件（具有兩列：UserID，MovieID）創建的tf.dataset中，甚至不需要定義模型或Estimator。 （原因：我想檢查我的數據管道中到底發生了什么，即我希望能夠通過管道運行一批樣本，然后在輸出中查看要素的編碼方式。）

到目前為止，這是我嘗試過的：

column_names = ['UserID', 'MovieID']

user_col = tf.feature_column.categorical_column_with_hash_bucket(key='UserID', hash_bucket_size=1000)
movie_col = tf.feature_column.categorical_column_with_hash_bucket(key='MovieID', hash_bucket_size=1000)
feature_columns = [tf.feature_column.indicator_column(user_col), tf.feature_column.indicator_column(movie_col)]

feature_layer = tf.keras.layers.DenseFeatures(feature_columns=feature_columns)

def process_csv(line):
  fields = tf.io.decode_csv(line, record_defaults=[tf.constant([], dtype=tf.int32)]*2, field_delim=";")
  features = dict(zip(column_names, fields))

  return features 

ds = tf.data.TextLineDataset(csv_filepath)
ds = ds.map(process_csv, num_parallel_calls=4)
ds = ds.batch(10)
ds.map(lambda x: feature_layer(x))

但是，map調用的最后一行引發以下錯誤：

ValueError：列dtype和SparseTensors dtype必須兼容。 鍵：MovieID，列dtype：，張量dtype：

我不確定這個錯誤是什么意思...我還嘗試僅使用定義的feature_layer定義tf.keras模型，然后在數據集上運行.predict（）-而不是使用ds.map（lambda x： feature_layer（x））：

model = tf.keras.Sequential([feature_layer])
model.compile()
model.predict(ds)

但是，這將導致與上述完全相同的錯誤。 有人知道出什么問題了嗎？ 有沒有可能更簡單的方法來實現這一目標？

Answer 1

剛發現問題：tf.feature_column.categorical_column_with_hash_bucket（）使用可選參數dtype，默認情況下將其設置為tf.dtypes.string。 但是，我的列的數據類型是數字（tf.dtypes.int32）。 這解決了問題：

tf.feature_column.categorical_column_with_hash_bucket(key='UserID', hash_bucket_size=1000, dtype=tf.dtypes.int32)

應用沒有tf.Estimator的功能列（Tensorflow 2.0.0-rc0）

問題描述

1 個解決方案

解決方案1
0 已采納 2019-09-06 01:49:37

應用沒有tf.Estimator的功能列（Tensorflow 2.0.0-rc0）

問題描述

1 個解決方案

解決方案1 0 已采納 2019-09-06 01:49:37

解決方案1
0 已采納 2019-09-06 01:49:37