简体   繁体   English

ValueError:无法将字符串转换为浮点数:'horse'

[英]ValueError: could not convert string to float: 'horse'

I am working one Keras CIFAR10 Learning experiment, and the images got them from Kaggle, which is a CSV file with two columns, on 'id', the other 'label'.我正在做一个 Keras CIFAR10 学习实验,图像是从 Kaggle 获取的,Kaggle 是一个包含两列的 CSV 文件,在“id”上,另一个在“标签”上。 from here I do this.我从这里开始。 I know that I need to convert my labels to tensors, but don't know how to do it.我知道我需要将标签转换为张量,但不知道该怎么做。 I looked in the internet everywhere of how, but couldn't find anything that deals with reading CSV file from kaggle.我在互联网上到处寻找方法,但找不到任何处理从 kaggle 读取 CSV 文件的内容。 Maybe this is not the way to do this....也许这不是这样做的方法......

here is the link https://www.kaggle.com/c/cifar-10 but there are not kernels as an example.这是链接https://www.kaggle.com/c/cifar-10但没有内核作为示例。

Thanks in advance for your help.在此先感谢您的帮助。

I am using from tensorflow.keras.xxxxxx我正在使用 tensorflow.keras.xxxxxx

import pandas as pd
print("Image IDs and Labels (TRAIN)")
train_df = pd.read_csv(TRAIN_DF_PATH)

# Add extension to id_code to train images
train_df['id'] = train_df['id'].apply(str) + ".png"

display(train_df.head())

def preprocess_image(path, sigmaX=40):
    """
    The whole preprocessing pipeline:
    1. Read in image
    3. Resize image to desired size
    """
    image = cv2.imread(path)
    image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
    image = cv2.resize(image, (IMG_WIDTH, IMG_HEIGHT))

    return image

# Add Image augmentation to our generator
train_datagen = ImageDataGenerator(rotation_range=360,
                                   horizontal_flip=True,
                                   vertical_flip=True,
                                   validation_split=0.25,
                                   rescale=1. / 255)

# Use the dataframe to define train and validation generators
train_generator = train_datagen.flow_from_dataframe(train_df, 
                                                    x_col='id', 
                                                    y_col='label',
                                                    directory = TRAIN_IMG_PATH,
                                                    target_size=(IMG_WIDTH, IMG_HEIGHT),
                                                    batch_size=BATCH_SIZE,
                                                    class_mode='other',
                                                    preprocessing_function=preprocess_image, 
                                                    subset='training')

val_generator = train_datagen.flow_from_dataframe(train_df, 
                                                  x_col='id', 
                                                  y_col='label',
                                                  directory = TRAIN_IMG_PATH,
                                                  target_size=(IMG_WIDTH, IMG_HEIGHT),
                                                  batch_size=BATCH_SIZE,
                                                  class_mode='other',
                                                  preprocessing_function=preprocess_image, 
                                                  subset='validation')

Batch_Size  = 64
epochs      = 25

# loop over the number of models to train
for i in np.arange(0, 5):

    # initialize the optimizer and model
    print("[INFO] training model {}/{}".format(i + 1, 5))
    opt = Adam(lr=1e-5)

    conv_base = ResNet50(weights='imagenet', include_top=False, input_shape=(32, 32, 3))

    model = models.Sequential()
    model.add(conv_base)
    model.add(layers.UpSampling2D((2,2)))
    model.add(layers.UpSampling2D((2,2)))
    model.add(layers.UpSampling2D((2,2)))
    model.add(layers.Flatten())
    model.add(layers.BatchNormalization())
    model.add(layers.Dense(128, activation='relu'))
    model.add(layers.Dropout(0.5))
    model.add(layers.BatchNormalization())
    model.add(layers.Dense(64, activation='relu'))
    model.add(layers.Dropout(0.5))
    model.add(layers.BatchNormalization())
    model.add(layers.Dense(10, activation='softmax'))

    early_stop = EarlyStopping('val_loss', patience=5)
    reduce_lr = ReduceLROnPlateau('val_loss', factor=0.01, patience=3, verbose=1)

    ############################################################################
    trained_models_path = './best_model_adam/'
    model_names = trained_models_path + 'epoch_{epoch:02d}_val_acc_{val_acc:.4f}_'
    model_checkpoint = ModelCheckpoint(model_names +"model_{}.hdf5".format(i), verbose=1, save_best_only=True)
    ############################################################################

    callbacks = [model_checkpoint, early_stop, reduce_lr]

    #model.compile(optimizer=optimizers.RMSprop(lr=2e-5), loss='binary_crossentropy', metrics=['acc'])
    model.compile(optimizer=Adam(lr=1e-5), loss='binary_crossentropy', metrics=['acc'])

    # train the network
    history = model.fit_generator(
                            train_generator,
                            epochs = epochs,
                            steps_per_epoch= train_df.shape[0] // Batch_Size,
                            validation_data= val_generator,
                            validation_steps = val_generator.shape[0] // Batch_Size,
                            #batch_size = Batch_Size, 
                            verbose=1,
                            callbacks = [model_checkpoint, early_stop]
                        )

    # save the model to disk
    p = ["./models/model_{}.model".format(i)]
    model.save(os.path.sep.join(p))

    # evaluate the network
    predictions = model.predict(testX, batch_size=64)
    report = classification_report(testY.argmax(axis=1), predictions.argmax(axis=1), target_names=labelNames)

    # save the classification report to file
    p = ["./output/model_{}.txt".format(i)]
    f = open(os.path.sep.join(p), "w")
    f.write(report)
    f.close()

When I run the fit_generator I get his error当我运行 fit_generator 我得到他的错误

/usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/constant_op.py in constant(value, dtype, shape, name)
    244   """
    245   return _constant_impl(value, dtype, shape, name, verify_shape=False,
--> 246                         allow_broadcast=True)
    247 
    248 

/usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/constant_op.py in _constant_impl(value, dtype, shape, name, verify_shape, allow_broadcast)
    252   ctx = context.context()
    253   if ctx.executing_eagerly():
--> 254     t = convert_to_eager_tensor(value, ctx, dtype)
    255     if shape is None:
    256       return t

/usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/constant_op.py in convert_to_eager_tensor(value, ctx, dtype)
    113     return t
    114   else:
--> 115     return ops.EagerTensor(value, handle, device, dtype)
    116 
    117 

ValueError: could not convert string to float: 'horse'

You can convert your category labels to numbers and then make a new column for those numbers.您可以将类别标签转换为数字,然后为这些数字创建一个新列。 scikit-learn has a builtin for this but its easy enough without that: scikit-learn 有一个内置的,但没有它也很容易:

import pandas as pd
df = pd.DataFrame({'label':['cat','dog','horse'],'b':[1,2,3]})
all_labels= df.label.unique().tolist()
all_labels.sort()
label_to_number={label:all_labels.index(label) for label in all_labels}
df['label_num']=df.apply(lambda r:label_to_number[r.label],axis=1)

Now you can send label_number to your training (y_col='label_number') .现在您可以将 label_number 发送到您的训练 (y_col='label_number') 。 This is all assuming that integer cateories are ok and you don't need 'one-hot encoding' - if you do then again scikit has provision for that.这一切都假设整数类别没问题,并且您不需要“one-hot encoding” - 如果您这样做了,那么 scikit 也有相应的规定。 From here it seems like the integer categories are fine however.然而,从这里看来,整数类别很好。

@jeremy_rutman, Thanks! @jeremy_rutman,谢谢! I got it working我让它工作了

import pandas as pd
print("Image IDs and Labels (TRAIN)")
train_df = pd.read_csv(TRAIN_DF_PATH)

# Add extension to id_code to train images
train_df['id'] = train_df['id'].apply(str) + ".png"

all_labels = train_df['label'].unique().tolist() 
all_labels.sort() 
label_to_number={label:all_labels.index(label) for label in all_labels} 
train_df['label']=train_df.apply(lambda r:label_to_number[r.label],axis=1)


display(train_df.head())
print(train_df['id'])
```
The model if fitting now, but for some reason, my two GPU's cards 
are not kicking in.... I think lots of things got broken with TensorFlow
 2.0, but that is another topic...
thanks a lot for your help.

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM