熊猫-平行化类型函数

Question

I'm dealing with a huge dataset with lots of features. 我正在处理具有许多功能的庞大数据集。 Those features are actually int type,but since they have np.nan values, pandas assigns float64 type to them. 这些功能实际上是int类型，但是由于它们具有np.nan值，因此pandas为其分配了float64类型。

I'm casting those features to float32 by iterating every single column. 我通过迭代每一列将这些功能转换为float32 。 It takes about 10 minutes to complete. 大约需要10分钟才能完成。 Is there any way to speed up this operation? 有什么办法可以加快此操作？

The data is read from a csv file. 从csv文件读取数据。 There are object and int64 columns in the data. 数据中有object和int64列。

for col in float_cols:
    df[col] = df[col].astype(np.float32)

Answer 1

使用dtype与词典参数read_csv ：

df = pd.read_csv(file, dtype=dict.fromkeys(float_cols, np.float32))

熊猫-平行化类型函数

问题描述

1 个解决方案

解决方案1
1 已采纳 2019-08-01 10:36:10

熊猫-平行化类型函数

问题描述

1 个解决方案

解决方案1 1 已采纳 2019-08-01 10:36:10

解决方案1
1 已采纳 2019-08-01 10:36:10