简体   繁体   English

应用没有tf.Estimator的功能列(Tensorflow 2.0.0-rc0)

[英]Apply feature columns without tf.Estimator (Tensorflow 2.0.0-rc0)

In the Tensorflow tf.Estimator and tf.feature_column docs it is well documented, how to use feature columns together with an Estimator eg in order to one-hot encode the categorical features in the dataset being used. 在Tensorflow tf.Estimator和tf.feature_column文档中,如何将特征列与Estimator结合使用,例如文档,以便对所使用的数据集中的分类特征进行单热编码,已有详细记录。

However, I want to "apply" my feature columns directly to a tf.dataset which I create from a .csv file (with two columns: UserID, MovieID), without even defining a model or an Estimator. 但是,我想将特征列直接“应用”到我从.csv文件(具有两列:UserID,MovieID)创建的tf.dataset中,甚至不需要定义模型或Estimator。 (Reason: I want to check what's happening exactly in my datapipeline, ie I'd like to be able to run a batch of samples through my the pipeline, and then see in the output how the features got encoded.) (原因:我想检查我的数据管道中到底发生了什么,即我希望能够通过管道运行一批样本,然后在输出中查看要素的编码方式。)

This is what I have tried so far: 到目前为止,这是我尝试过的:

column_names = ['UserID', 'MovieID']

user_col = tf.feature_column.categorical_column_with_hash_bucket(key='UserID', hash_bucket_size=1000)
movie_col = tf.feature_column.categorical_column_with_hash_bucket(key='MovieID', hash_bucket_size=1000)
feature_columns = [tf.feature_column.indicator_column(user_col), tf.feature_column.indicator_column(movie_col)]

feature_layer = tf.keras.layers.DenseFeatures(feature_columns=feature_columns)

def process_csv(line):
  fields = tf.io.decode_csv(line, record_defaults=[tf.constant([], dtype=tf.int32)]*2, field_delim=";")
  features = dict(zip(column_names, fields))

  return features 

ds = tf.data.TextLineDataset(csv_filepath)
ds = ds.map(process_csv, num_parallel_calls=4)
ds = ds.batch(10)
ds.map(lambda x: feature_layer(x))

However the last line with the map call raises the following error: 但是,map调用的最后一行引发以下错误:

ValueError: Column dtype and SparseTensors dtype must be compatible. ValueError:列dtype和SparseTensors dtype必须兼容。 key: MovieID, column dtype: , tensor dtype: 键:MovieID,列dtype:,张量dtype:

I'm not sure what this error means... I also tried to define a tf.keras model with only the feature_layer I defined, and then run .predict() on my dataset - instead of using ds.map(lambda x: feature_layer(x)): 我不确定这个错误是什么意思...我还尝试仅使用定义的feature_layer定义tf.keras模型,然后在数据集上运行.predict()-而不是使用ds.map(lambda x: feature_layer(x)):

model = tf.keras.Sequential([feature_layer])
model.compile()
model.predict(ds)

However, this results exactly in the same error as above. 但是,这将导致与上述完全相同的错误。 Does anybody have an idea what is going wrong? 有人知道出什么问题了吗? Is there maybe an easier way to achieve this? 有没有可能更简单的方法来实现这一目标?

Just found the issue: tf.feature_column.categorical_column_with_hash_bucket() takes an optional argument dtype, which is set to tf.dtypes.string by default. 刚发现问题:tf.feature_column.categorical_column_with_hash_bucket()使用可选参数dtype,默认情况下将其设置为tf.dtypes.string。 However, the datatype of my columns is numerical (tf.dtypes.int32). 但是,我的列的数据类型是数字(tf.dtypes.int32)。 This solved the issue: 这解决了问题:

tf.feature_column.categorical_column_with_hash_bucket(key='UserID', hash_bucket_size=1000, dtype=tf.dtypes.int32)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 张量流馈送列表功能(多热)到tf.estimator - tensorflow feed list feature (multi-hot) to tf.estimator 将 Tensorflow 分析器与 tf.Estimator 结合使用 - Using the Tensorflow profiler with tf.Estimator tensorflow:使用tf.estimator和keras进行词汇查询 - tensorflow: vocabulary lookup with tf.estimator and keras 使用tensorflow tf.estimator提前停止? - Early stopping using tensorflow tf.estimator ? 具有连续和分类列的 HOWTO tf.estimator - HOWTO tf.estimator with continuous and categorical columns Tensorflow,在另一个tf.estimator model_fn中使用经过tf.estimator训练的模型 - Tensorflow, use a tf.estimator trained model within another tf.estimator model_fn Tensorflow-Python-是否可以使用tf.Estimator冻结图而不包含“ RandomShuffleQueueV2”和“ QueueDequeueMany”节点? - Tensorflow-Python - Is it possible to freeze a graph using tf.Estimator without including the “RandomShuffleQueueV2” and the “QueueDequeueMany” nodes? 如何在Tensorflow中的tf.estimator上使用tensorflow调试工具tfdbg? - How to use tensorflow debugging tool tfdbg on tf.estimator in Tensorflow? Tensorflow:有没有办法在 tf.Estimator 中存储训练损失 - Tensorflow: Is there a way to store the training loss in tf.Estimator 使用tf.Estimator创建的tensorflow上的图优化 - Graph optimizations on a tensorflow serveable created using tf.Estimator
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM