Pandas 拆分數據幀並獲取數據行的其余部分

Question

我正在使用此代碼拆分數據框

df_80_split = df.sample(frac=0.8,random_state=200)

我需要的是將原始條目中的其余條目放入新的數據框中，例如

df_20_split = df - df_80_split

什么是編碼的好方法？

Answer 1

使用 sklearn 的train_test_split()是一種非常好的方法，尤其適用於大型數據集。

#import sklearn method to split training data
from sklearn.model_selection import train_test_split

# using your variable names
df_80_split, df_20_split = train_test_split(df, test_size = 0.2, random_state = 200)

如果您也提供目標變量，您還可以從目標中分離特征以進行訓練和驗證。

X_train, X_test, y_train, y_test = train_test_split(
       features, target, test_size = 0.2, random_state = 200)

文檔中有很多細節

Answer 2

假設您的數據幀的索引值都是唯一的，純 Pandas 解決方案將是：

df_20_split = df[~df.index.isin(df_80_split.index)]

完整代碼：

# Just sample data
df = pd.DataFrame({'a':[*'abcdefg']*1000}).sort_values('a').reset_index(drop=True)

# Split the data
df_80_split = df.sample(frac=0.8, random_state=200)

# Get the remainder
df_20_split = df[~df.index.isin(df_80_split.index)]

輸出：

>>> df_80_split.shape
(5600, 1)

>>> df_20_split.shape
(1400, 1)

>>> 5600 + 1400
7000

>>> df.shape
(7000, 1)

Pandas 拆分數據幀並獲取數據行的其余部分

問題描述

2 個解決方案

解決方案1
1 2021-11-15 23:10:53

解決方案2
0 2021-11-15 23:09:34

Pandas 拆分數據幀並獲取數據行的其余部分

問題描述

2 個解決方案

解決方案1 1 2021-11-15 23:10:53

解決方案2 0 2021-11-15 23:09:34

解決方案1
1 2021-11-15 23:10:53

解決方案2
0 2021-11-15 23:09:34