将具有多种数据类型的python序列转换为张量

Question

I'm using TensorFlow r1.7 and python3.6.5.我正在使用 TensorFlow r1.7 和 python3.6.5。 I am also very new to TensorFlow, so I'd like easy to read explanations if possible.我对 TensorFlow 也很陌生，所以如果可能的话，我希望有易于阅读的解释。

I'm trying to convert my input data into a dataset of tensors with this function tf.data.Dataset.from_tensor_slices() .我正在尝试使用此函数tf.data.Dataset.from_tensor_slices()将我的输入数据转换为张量数据集。 I pass my tuple with mixed datatypes into this function.我将混合数据类型的元组传递给这个函数。 However, when running my code I get this error: ValueError: Can't convert Python sequence with mixed types to Tensor .但是，在运行我的代码时，我收到此错误： ValueError: Can't convert Python sequence with mixed types to Tensor 。

I want to know why I am receiving this error, and how I can convert my data to a dataset of tensors even with mixed datatypes.我想知道为什么我会收到此错误，以及如何将我的数据转换为张量数据集，即使是混合数据类型。

Here's a printout of the top 5 entries in my tuple.这是我的元组中前 5 个条目的打印输出。

(13501, 2, None, 51, '2232', 'S35', '734.72', 'CLA', '240', 1035, 2060, 1252, 1182, 10, '967.28', '338.50', None, 14, 102, 3830)
(15124, 2, None, 57, '2641', 'S35', '234.80', 'DDA', '240', 743, 1597, 4706, 156, 0, None, None, None, 3, 27, 981)
(40035, 2, None, None, '21', 'K00', '60.06', 'CHK', '520', 76, 1863, 12, None, 1, '85.06', '25.00', None, 1, 5, 245)
(42331, 3, None, 62, '121', 'S50', '1859.01', 'ACT', '420', 952, 1583, 410, 255, 0, None, None, None, 6, 117, 1795)
(201721, 3, None, 42, '2472', 'S35', '1413.84', 'CLA', '350', 868, 1746, 963, 264, 0, None, None, None, 18, 65, 4510)

As you can see, I have a mix of integers, floats, and strings in my input data.如您所见，我的输入数据中混合了整数、浮点数和字符串。

Here is a traceback of the error:这是错误的回溯：

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/miikey101/Documents/Khalen_Case_Loader/tensorflow/k_means/k_means.py", line 10, in prepare_dataset
    dataset = tf.data.Dataset.from_tensor_slices(dm_data)
  File "/usr/local/lib/python3.6/site-packages/tensorflow/python/data/ops/dataset_ops.py", line 222, in from_tensor_slices
    return TensorSliceDataset(tensors)
  File "/usr/local/lib/python3.6/site-packages/tensorflow/python/data/ops/dataset_ops.py", line 1017, in __init__
    for i, t in enumerate(nest.flatten(tensors))
  File "/usr/local/lib/python3.6/site-packages/tensorflow/python/data/ops/dataset_ops.py", line 1017, in <listcomp>
    for i, t in enumerate(nest.flatten(tensors))
  File "/usr/local/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 950, in convert_to_tensor
    as_ref=False)
  File "/usr/local/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 1040, in internal_convert_to_tensor
    ret = conversion_func(value, dtype=dtype, name=name, as_ref=as_ref)
  File "/usr/local/lib/python3.6/site-packages/tensorflow/python/framework/constant_op.py", line 235, in _constant_tensor_conversion_function
    return constant(v, dtype=dtype, name=name)
  File "/usr/local/lib/python3.6/site-packages/tensorflow/python/framework/constant_op.py", line 185, in constant
    t = convert_to_eager_tensor(value, ctx, dtype)
  File "/usr/local/lib/python3.6/site-packages/tensorflow/python/framework/constant_op.py", line 131, in convert_to_eager_tensor
    return ops.EagerTensor(value, context=handle, device=device, dtype=dtype)
ValueError: Can't convert Python sequence with mixed types to Tensor.

Answer 1

In tensorflow you can't have a tensor with more than one data type.在 tensorflow 中，您不能拥有具有多种数据类型的张量。

Quoting thedocumentation :引用文档：

It is not possible to have a tf.Tensor with more than one data type. tf.Tensor 不可能有多种数据类型。 It is possible, however, to serialize arbitrary data structures as strings and store those in tf.Tensors.但是，可以将任意数据结构序列化为字符串并将其存储在 tf.Tensors 中。

Hence a workaround could be to create a tensor with data type tf.String and, on the occurrence, cast the field to the desired data type因此，一种解决方法可能是创建一个数据类型为tf.String的张量，并在出现时将该字段转换为所需的数据类型

Answer 2

You want a tensor for each of your features (columns).您需要每个特征（列）的张量。 Only if it's a multi-dimensional feature (like an image, a video, list of strings, vector) would you have more dimensions in the tensor and even then they would all have the same datatype.只有当它是多维特征（如图像、视频、字符串列表、向量）时，张量中才会有更多维度，即使如此，它们也会具有相同的数据类型。

tf.data.Dataset.from_tensor_slices() will accept your input as a dictionary of lists (key is the name of the feature, value is a list of the values in that feature), or as a list of lists. tf.data.Dataset.from_tensor_slices()将接受您的输入作为列表字典（键是特征的名称，值是该特征中的值的列表），或作为列表的列表。 I can't remember if it eats Pandas dataframes but if it doesn't you can easily convert it to a dictionary df.to_dict() .我不记得它是否吃了 Pandas 数据帧，但如果没有，您可以轻松将其转换为字典df.to_dict() 。

However, you can't input None values.但是，您不能输入None值。 You will have to find some value for those before converting into a tensor.在转换为张量之前，您必须为这些找到一些值。 Classic approaches to that is median value, zero value, most common value, "missing"/"unknown" value for strings or categories, or imputation.经典方法是中值、零值、最常见值、字符串或类别的“缺失”/“未知”值或插补。

将具有多种数据类型的python序列转换为张量

问题描述

2 个解决方案

解决方案1
8 已采纳 2018-04-14 07:44:20

解决方案2
1 2021-03-07 12:22:48

将具有多种数据类型的python序列转换为张量

问题描述

2 个解决方案

解决方案1 8 已采纳 2018-04-14 07:44:20

解决方案2 1 2021-03-07 12:22:48

解决方案1
8 已采纳 2018-04-14 07:44:20

解决方案2
1 2021-03-07 12:22:48