AttributeError: 'DataFrame' object 没有属性 'randomSplit'

Question

I am trying to split my data into train and test sets.我正在尝试将我的数据分成训练集和测试集。 The data is a Koalas dataframe. However, when I run the below code I am getting the error:数据是一只 Koalas dataframe。但是，当我运行以下代码时出现错误：

AttributeError: 'DataFrame' object has no attribute 'randomSplit'

Please find below the code I am using:请在下面找到我正在使用的代码：

splits = Closed_new.randomSplit([0.7,0.3])

Besides I tried the usual way of splitting the data after converting the Koalas to pandas. But it takes a lot of time to get executed in Synapse.此外，我尝试了将 Koalas 转换为 pandas 后拆分数据的常用方法。但在 Synapse 中执行需要花费大量时间。 Below is the code:下面是代码：

state = 12  
test_size = 0.30  
from sklearn.model_selection import train_test_split
  
X_train, X_val, y_train, y_val = train_test_split(Closed_new,labels,  
    test_size=test_size, random_state=state)

Answer 1

I'm afraid that, at the time of this question, Pyspark's randomSplit does not have an equivalent in Koalas yet.恐怕，在提出这个问题时，Pyspark 的randomSplit在 Koalas 中还没有等价物。

One trick you can use is to transform the Koalas dataframe into a Spark dataframe, use randomSplit and convert the two subsets to Koalas back again.您可以使用的一个技巧是将 Koalas dataframe 转换为 Spark dataframe，使用randomSplit并将两个子集再次转换回 Koalas。

splits = Closed_new.to_spark().randomSplit([0.7, 0.3], seed=12)
df_train = splits[0].to_koalas()
df_test = splits[1].to_koalas()

AttributeError: 'DataFrame' object 没有属性 'randomSplit'

问题描述

1 个解决方案

解决方案1
0 已采纳 2022-03-17 11:46:26

AttributeError: 'DataFrame' object 没有属性 'randomSplit'

问题描述

1 个解决方案

解决方案1 0 已采纳 2022-03-17 11:46:26

解决方案1
0 已采纳 2022-03-17 11:46:26