[英]Error while reading csv file: converting a column from string to float
I am trying to read a csv file that contains a column, SpType, in which there are String values.我正在尝试读取一个包含 SpType 列的 csv 文件,其中有字符串值。 My variable is being converted into an object, but I need it to be float type.我的变量正在转换为对象,但我需要它是浮点类型。 Here's the snippet:这是片段:
data = pd.read_csv("/content/Star3642_balanced.csv")
X_orig = data[["Vmag", "Plx", "e_Plx", "B-V", "SpType", "Amag"]].to_numpy()
Here's what's giving me the error:这是给我错误的原因:
X = torch.tensor(X_orig, dtype=torch.float32)
The error reads "can't convert np.ndarray of type numpy.object_. The only supported types are: float64, float32, float16, complex64, complex128, int64, int32, int16, int8, uint8, and bool."
错误显示"can't convert np.ndarray of type numpy.object_. The only supported types are: float64, float32, float16, complex64, complex128, int64, int32, int16, int8, uint8, and bool."
I tried doing this after reading the csv file, but it didn't help:我在阅读 csv 文件后尝试这样做,但没有帮助:
data["SpType"] = data.SpType.astype(float)
Can someone please tell me what can be done about this?有人可以告诉我可以做些什么吗?
Strings should be encoded into numeric values.字符串应编码为数值。 The easiest way would be using Pandas one-hot encoding (that will create lots of extra columns in this case, but a neural network should process those without much effort):最简单的方法是使用 Pandas one-hot 编码(在这种情况下会创建很多额外的列,但是神经网络应该不费吹灰之力地处理这些列):
ohe = pd.get_dummies(data["SpType"], drop_first=True)
data[ohe.columns] = ohe
data = data.drop(["SpType"], axis=1)
Alternatively, you may use sklearn encoders or category_encoders library - more complex encoding might require to process the test set separately to avoid the target leakage.或者,您可以使用 sklearn 编码器或 category_encoders 库 - 更复杂的编码可能需要单独处理测试集以避免目标泄漏。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.