[英]AttributeError: 'DataFrame' object has no attribute 'randomSplit'
I am trying to split my data into train and test sets.我正在尝试将我的数据分成训练集和测试集。 The data is a Koalas dataframe. However, when I run the below code I am getting the error:数据是一只 Koalas dataframe。但是,当我运行以下代码时出现错误:
AttributeError: 'DataFrame' object has no attribute 'randomSplit'
Please find below the code I am using:请在下面找到我正在使用的代码:
splits = Closed_new.randomSplit([0.7,0.3])
Besides I tried the usual way of splitting the data after converting the Koalas to pandas. But it takes a lot of time to get executed in Synapse.此外,我尝试了将 Koalas 转换为 pandas 后拆分数据的常用方法。但在 Synapse 中执行需要花费大量时间。 Below is the code:下面是代码:
state = 12
test_size = 0.30
from sklearn.model_selection import train_test_split
X_train, X_val, y_train, y_val = train_test_split(Closed_new,labels,
test_size=test_size, random_state=state)
I'm afraid that, at the time of this question, Pyspark's randomSplit
does not have an equivalent in Koalas yet.恐怕,在提出这个问题时,Pyspark 的randomSplit
在 Koalas 中还没有等价物。
One trick you can use is to transform the Koalas dataframe into a Spark dataframe, use randomSplit
and convert the two subsets to Koalas back again.您可以使用的一个技巧是将 Koalas dataframe 转换为 Spark dataframe,使用randomSplit
并将两个子集再次转换回 Koalas。
splits = Closed_new.to_spark().randomSplit([0.7, 0.3], seed=12)
df_train = splits[0].to_koalas()
df_test = splits[1].to_koalas()
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.