读取 csv 文件时出错：将列从字符串转换为浮点数

Question

I am trying to read a csv file that contains a column, SpType, in which there are String values.我正在尝试读取一个包含 SpType 列的 csv 文件，其中有字符串值。 My variable is being converted into an object, but I need it to be float type.我的变量正在转换为对象，但我需要它是浮点类型。 Here's the snippet:这是片段：

data = pd.read_csv("/content/Star3642_balanced.csv")

X_orig = data[["Vmag", "Plx", "e_Plx", "B-V", "SpType", "Amag"]].to_numpy()

Here's what's giving me the error:这是给我错误的原因：

X = torch.tensor(X_orig, dtype=torch.float32)

The error reads "can't convert np.ndarray of type numpy.object_. The only supported types are: float64, float32, float16, complex64, complex128, int64, int32, int16, int8, uint8, and bool."错误显示"can't convert np.ndarray of type numpy.object_. The only supported types are: float64, float32, float16, complex64, complex128, int64, int32, int16, int8, uint8, and bool."

I tried doing this after reading the csv file, but it didn't help:我在阅读 csv 文件后尝试这样做，但没有帮助：

data["SpType"] = data.SpType.astype(float)

Can someone please tell me what can be done about this?有人可以告诉我可以做些什么吗？

Answer 1

Strings should be encoded into numeric values.字符串应编码为数值。 The easiest way would be using Pandas one-hot encoding (that will create lots of extra columns in this case, but a neural network should process those without much effort):最简单的方法是使用 Pandas one-hot 编码（在这种情况下会创建很多额外的列，但是神经网络应该不费吹灰之力地处理这些列）：

ohe = pd.get_dummies(data["SpType"], drop_first=True)
data[ohe.columns] = ohe
data = data.drop(["SpType"], axis=1)

Alternatively, you may use sklearn encoders or category_encoders library - more complex encoding might require to process the test set separately to avoid the target leakage.或者，您可以使用 sklearn 编码器或 category_encoders 库 - 更复杂的编码可能需要单独处理测试集以避免目标泄漏。

读取 csv 文件时出错：将列从字符串转换为浮点数

问题描述

1 个解决方案

解决方案1
0 已采纳 2022-06-13 13:36:48

读取 csv 文件时出错：将列从字符串转换为浮点数

问题描述

1 个解决方案

解决方案1 0 已采纳 2022-06-13 13:36:48

解决方案1
0 已采纳 2022-06-13 13:36:48