[英]Problem with neural network in TensorFlow 2.0
import tensorflow as tf
import pandas as pd
import numpy as np
import matplotlib as plt
from sklearn.model_selection import train_test_split
from tensorflow import keras
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
from sklearn.preprocessing import StandardScaler
import functools
LABEL_COLUMN = 'Endstage'
LABELS = [1, 2, 3, 4]
x = pd.read_csv('HCVnew.csv', index_col=False)
def get_dataset(file_path, **kwargs):
dataset = tf.data.experimental.make_csv_dataset(
file_path,
batch_size=35, # Artificially small to make examples easier to show.
label_name=LABEL_COLUMN,
na_value="?",
num_epochs=1,
ignore_errors=True,
**kwargs)
return dataset
SELECT_COLUMNS = ["Alter", "Gender", "BMI", "Fever", "Nausea", "Fatigue",
"WBC", "RBC", "HGB", "Plat", "AST1", "ALT1", "ALT4", "ALT12", "ALT24", "ALT36", "ALT48", "ALT24w",
"RNABase", "RNA4", "Baseline", "Endstage"]
DEFAULTS = [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0,
0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]
temp_dataset = get_dataset("HCVnew.csv",
select_columns=SELECT_COLUMNS,
column_defaults=DEFAULTS)
def pack(features, label):
return tf.stack(list(features.values()), axis=-1), label
packed_dataset = temp_dataset.map(pack)
"""
for features, labels in packed_dataset.take(1):
print(features.numpy())
print()
print(labels.numpy())
"""
NUMERIC_FEATURES = ["Alter", "Gender","BMI", "Fever", "Nausea", "Fatigue",
"WBC", "RBC", "HGB", "Plat", "AST1", "ALT1", "ALT4", "ALT12", "ALT24", "ALT36", "ALT48", "ALT24w",
"RNABase", "RNA4", "Baseline", "Endstage"]
desc = pd.read_csv("HCVnew.csv")[NUMERIC_FEATURES].describe()
MEAN = np.array(desc.T['mean'])
STD = np.array(desc.T['std'])
def normalize_numeric_data(data, mean, std):
# Center the data
return (data-mean)/std
# See what you just created.
raw_train_data = get_dataset("HCVnew.csv")
raw_test_data = get_dataset("HCVnew.csv")
class PackNumericFeatures(object):
def __init__(self, names):
self.names = names
def __call__(self, features, labels):
numeric_freatures = [features.pop(name) for name in self.names]
numeric_features = [tf.cast(feat, tf.float32) for feat in numeric_freatures]
numeric_features = tf.stack(numeric_features, axis=-1)
features['numeric'] = numeric_features
return features, labels
NUMERIC_FEATURES = ["Alter", "Gender","BMI", "Fever", "Nausea", "Fatigue",
"WBC", "RBC", "HGB", "Plat", "AST1", "ALT1", "ALT4", "ALT12", "ALT24", "ALT36", "ALT48", "ALT24w",
"RNABase", "RNA4", "Baseline", "Endstage"]
packed_train_data = raw_train_data.map(
PackNumericFeatures(NUMERIC_FEATURES))
packed_test_data = raw_test_data.map(
PackNumericFeatures(NUMERIC_FEATURES))
normalizer = functools.partial(normalize_numeric_data, mean=MEAN, std=STD)
numeric_column = tf.feature_column.numeric_column('numeric', normalizer_fn=normalizer, shape=[len(NUMERIC_FEATURES)])
numeric_columns = [numeric_column]
numeric_layer = tf.keras.layers.DenseFeatures(numeric_columns)
preprocessing_layer = tf.keras.layers.DenseFeatures(numeric_columns)
#———————————————————————MODEL———————————————————————————————————————————————————————————————————————————————————————————
model = tf.keras.Sequential([
preprocessing_layer,
tf.keras.layers.Dense(128, activation='relu'),
tf.keras.layers.Dense(128, activation='relu'),
tf.keras.layers.Dense(1, activation='sigmoid'),
])
model.compile(
loss='binary_crossentropy',
optimizer='adam',
metrics=['accuracy'])
data_x = get_dataset("HCVnew.csv")
train_data = data_x.shuffle(500)
model.fit(train_data, epochs=20)
Hello, I'm trying to build a neural network that can predict Hepatitis C based on a csv file containing patient information and I can't fix the error... I'm getting the error: KeyError 'Endstage', whereas Endstage is the csv column that contains the corresponding values (between 1 and 4) and serves as the label column.您好,我正在尝试构建一个可以基于包含患者信息的 csv 文件预测丙型肝炎的神经网络,但我无法修复错误...我收到错误:KeyError 'Endstage',而 Endstage 是包含相应值(介于 1 和 4 之间)并用作标签列的 csv 列。 If someone has an idea that could fix my problem then please tell me.
如果有人有可以解决我的问题的想法,请告诉我。 Thanks a lot for your help!
非常感谢你的帮助!
That's because Endstage
is your label column and the framework does a favour to you by removing (popping) it out of your dataset.这是因为
Endstage
是您的标签列,并且框架通过将其从数据集中删除(弹出)来帮助您。 Otherwise your training data set would have also the target class, rendering it useless.否则你的训练数据集也会有目标类,使其无用。
Remove it from NUMERIC_FEATURES
and any other place that makes it into your training set features.将其从
NUMERIC_FEATURES
和使其成为您的训练集特征的任何其他位置中删除。
[EDIT] [编辑]
The OP asked in the follow-up question (in the comments) why, after fixing the initial problem, he's getting an error: OP 在后续问题中(在评论中)询问为什么在解决初始问题后,他收到错误消息:
ValueError: Feature numeric is not in features dictionary
ValueError:特征数字不在特征字典中
By the looks of it, feature called numeric
is produced via call to PackNumericFeatures
.从表面
PackNumericFeatures
,称为numeric
特征是通过调用PackNumericFeatures
。 The latter is used to create packed_train_data
and packed_test_data
, but these are never used.后者用于创建
packed_train_data
和packed_test_data
,但从未使用过。 Yet this line:然而这一行:
numeric_column = tf.feature_column.numeric_column('numeric', normalizer_fn=normalizer, shape=[len(NUMERIC_FEATURES)])
assumes the data is there - hence the error.假设数据在那里 - 因此错误。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.