[英]Pandas Split dataframe and get remainder of data row
我正在使用此代碼拆分數據框
df_80_split = df.sample(frac=0.8,random_state=200)
我需要的是將原始條目中的其余條目放入新的數據框中,例如
df_20_split = df - df_80_split
什么是編碼的好方法?
使用 sklearn 的train_test_split()
是一種非常好的方法,尤其適用於大型數據集。
#import sklearn method to split training data
from sklearn.model_selection import train_test_split
# using your variable names
df_80_split, df_20_split = train_test_split(df, test_size = 0.2, random_state = 200)
如果您也提供目標變量,您還可以從目標中分離特征以進行訓練和驗證。
X_train, X_test, y_train, y_test = train_test_split(
features, target, test_size = 0.2, random_state = 200)
文檔中有很多細節
假設您的數據幀的索引值都是唯一的,純 Pandas 解決方案將是:
df_20_split = df[~df.index.isin(df_80_split.index)]
完整代碼:
# Just sample data
df = pd.DataFrame({'a':[*'abcdefg']*1000}).sort_values('a').reset_index(drop=True)
# Split the data
df_80_split = df.sample(frac=0.8, random_state=200)
# Get the remainder
df_20_split = df[~df.index.isin(df_80_split.index)]
輸出:
>>> df_80_split.shape
(5600, 1)
>>> df_20_split.shape
(1400, 1)
>>> 5600 + 1400
7000
>>> df.shape
(7000, 1)
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.