[英]TypeError: argument of type 'float' is not iterable
I am new to python and TensorFlow.我是 python 和 TensorFlow 的新手。 I recently started understanding and executing TensorFlow examples, and came across this one: https://www.tensorflow.org/versions/r0.10/tutorials/wide_and_deep/index.html
我最近开始理解和执行 TensorFlow 示例,并遇到了这个: https : //www.tensorflow.org/versions/r0.10/tutorials/wide_and_deep/index.html
I got the error, TypeError: argument of type 'float' is not iterable , and I believe that the problem is with the following line of code:我收到错误TypeError: argument of type 'float' is not iterable ,我相信问题出在以下代码行:
df_train[LABEL_COLUMN] = (df_train['income_bracket'].apply(lambda x: '>50K' in x)).astype(int)
(income_bracket is the label column of the census dataset, with '>50K' being one of the possible label values, and the other label is '=<50K'. The dataset is read into df_train. The explanation provided in the documentation for the reason to do the above is, "Since the task is a binary classification problem, we'll construct a label column named "label" whose value is 1 if the income is over 50K, and 0 otherwise.") (income_bracket 是人口普查数据集的标签列,其中 '>50K' 是可能的标签值之一,另一个标签是 '=<50K'。数据集被读入 df_train。文档中提供的解释这样做的原因是,“由于任务是一个二元分类问题,我们将构造一个名为“label”的标签列,如果收入超过 50K,其值为 1,否则为 0。”)
If anyone could explain me what is exactly happening and how should I fix it, that'll be great.如果有人能向我解释到底发生了什么以及我应该如何解决它,那就太好了。 I tried using Python2.7 and Python3.4, and I don't think that the problem is with the version of the language.
我尝试使用Python2.7和Python3.4,我认为问题不是语言版本的问题。 Also, if anyone is aware of great tutorials for someone who is new to TensorFlow and pandas, please share the links.
另外,如果有人知道适合 TensorFlow 和 Pandas 新手的很棒的教程,请分享链接。
Complete program:完整程序:
import pandas as pd
import urllib
import tempfile
import tensorflow as tf
gender = tf.contrib.layers.sparse_column_with_keys(column_name="gender", keys=["female", "male"])
race = tf.contrib.layers.sparse_column_with_keys(column_name="race", keys=["Amer-Indian-Eskimo", "Asian-Pac-Islander", "Black", "Other", "White"])
education = tf.contrib.layers.sparse_column_with_hash_bucket("education", hash_bucket_size=1000)
marital_status = tf.contrib.layers.sparse_column_with_hash_bucket("marital_status", hash_bucket_size=100)
relationship = tf.contrib.layers.sparse_column_with_hash_bucket("relationship", hash_bucket_size=100)
workclass = tf.contrib.layers.sparse_column_with_hash_bucket("workclass", hash_bucket_size=100)
occupation = tf.contrib.layers.sparse_column_with_hash_bucket("occupation", hash_bucket_size=1000)
native_country = tf.contrib.layers.sparse_column_with_hash_bucket("native_country", hash_bucket_size=1000)
age = tf.contrib.layers.real_valued_column("age")
age_buckets = tf.contrib.layers.bucketized_column(age, boundaries=[18, 25, 30, 35, 40, 45, 50, 55, 60, 65])
education_num = tf.contrib.layers.real_valued_column("education_num")
capital_gain = tf.contrib.layers.real_valued_column("capital_gain")
capital_loss = tf.contrib.layers.real_valued_column("capital_loss")
hours_per_week = tf.contrib.layers.real_valued_column("hours_per_week")
wide_columns = [gender, native_country, education, occupation, workclass, marital_status, relationship, age_buckets, tf.contrib.layers.crossed_column([education, occupation], hash_bucket_size=int(1e4)), tf.contrib.layers.crossed_column([native_country, occupation], hash_bucket_size=int(1e4)), tf.contrib.layers.crossed_column([age_buckets, race, occupation], hash_bucket_size=int(1e6))]
deep_columns = [
tf.contrib.layers.embedding_column(workclass, dimension=8),
tf.contrib.layers.embedding_column(education, dimension=8),
tf.contrib.layers.embedding_column(marital_status, dimension=8),
tf.contrib.layers.embedding_column(gender, dimension=8),
tf.contrib.layers.embedding_column(relationship, dimension=8),
tf.contrib.layers.embedding_column(race, dimension=8),
tf.contrib.layers.embedding_column(native_country, dimension=8),
tf.contrib.layers.embedding_column(occupation, dimension=8),
age, education_num, capital_gain, capital_loss, hours_per_week]
model_dir = tempfile.mkdtemp()
m = tf.contrib.learn.DNNLinearCombinedClassifier(
model_dir=model_dir,
linear_feature_columns=wide_columns,
dnn_feature_columns=deep_columns,
dnn_hidden_units=[100, 50])
COLUMNS = ["age", "workclass", "fnlwgt", "education", "education_num",
"marital_status", "occupation", "relationship", "race", "gender",
"capital_gain", "capital_loss", "hours_per_week", "native_country", "income_bracket"]
LABEL_COLUMN = 'label'
CATEGORICAL_COLUMNS = ["workclass", "education", "marital_status", "occupation", "relationship", "race", "gender", "native_country"]
CONTINUOUS_COLUMNS = ["age", "education_num", "capital_gain", "capital_loss", "hours_per_week"]
train_file = tempfile.NamedTemporaryFile()
test_file = tempfile.NamedTemporaryFile()
urllib.urlretrieve("https://archive.ics.uci.edu/ml/machine-learning-databases/adult/adult.data", train_file.name)
urllib.urlretrieve("https://archive.ics.uci.edu/ml/machine-learning-databases/adult/adult.test", test_file.name)
df_train = pd.read_csv(train_file, names=COLUMNS, skipinitialspace=True)
df_test = pd.read_csv(test_file, names=COLUMNS, skipinitialspace=True, skiprows=1)
df_train[LABEL_COLUMN] = (df_train['income_bracket'].apply(lambda x: '>50K' in x)).astype(int)
df_test[LABEL_COLUMN] = (df_test['income_bracket'].apply(lambda x: '>50K' in x)).astype(int)
def input_fn(df):
continuous_cols = {k: tf.constant(df[k].values)
for k in CONTINUOUS_COLUMNS}
categorical_cols = {k: tf.SparseTensor(
indices=[[i, 0] for i in range(df[k].size)],
values=df[k].values,
shape=[df[k].size, 1])
for k in CATEGORICAL_COLUMNS}
feature_cols = dict(continuous_cols.items() + categorical_cols.items())
label = tf.constant(df[LABEL_COLUMN].values)
return feature_cols, label
def train_input_fn():
return input_fn(df_train)
def eval_input_fn():
return input_fn(df_test)
m.fit(input_fn=train_input_fn, steps=200)
results = m.evaluate(input_fn=eval_input_fn, steps=1)
for key in sorted(results):
print("%s: %s" % (key, results[key]))
Thank you谢谢
PS: Full stack trace for the error PS:错误的完整堆栈跟踪
Traceback (most recent call last):
File "/home/jaspreet/PycharmProjects/TicTacTensorFlow/census.py", line 73, in <module>
df_train[LABEL_COLUMN] = (df_train['income_bracket'].apply(lambda x: '>50K' in x)).astype(int)
File "/usr/lib/python2.7/dist-packages/pandas/core/series.py", line 2023, in apply
mapped = lib.map_infer(values, f, convert=convert_dtype)
File "inference.pyx", line 920, in pandas.lib.map_infer (pandas/lib.c:44780)
File "/home/jaspreet/PycharmProjects/TicTacTensorFlow/census.py", line 73, in <lambda>
df_train[LABEL_COLUMN] = (df_train['income_bracket'].apply(lambda x: '>50K' in x)).astype(int)
TypeError: argument of type 'float' is not iterable
As you can see, when you inspect the test.data
, you will obviously see that the first line of data has "NAN" in income_bracket
field.如您所见,当您检查
test.data
,您会很明显地看到第一行数据在income_bracket
字段中有“NAN”。
I have further inspected that this is the only line contains "NAN" by doing:我通过执行以下操作进一步检查了这是唯一包含“NAN”的行:
ib = df_test ["income_bracket"]
t = type('12')
for idx,i in enumerate(ib):
if(type(i) != t):
print idx,type(i)
RESULT: 0 <type 'float'>
So you may just skip this row by:所以你可以跳过这一行:
df_test = pd.read_csv(file_test , names=COLUMNS, skipinitialspace=True, skiprows=1)
该程序可以与最新版本的熊猫(即 0.18.1)逐字运行
maybe got a number in the for loop after in keyword try to skip it with a test ("isinstance" )可能在 in 关键字之后的 for 循环中有一个数字尝试通过测试跳过它(“isinstance”)
if(isinstance(lines, str)):
for x in lines:
foo()
else:
skip
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.