I'm trying (badly) to use sklearn's LOO functionality and what I would like to do is append each training split set into a dataframe column with a label for the split index. So using the example from the sklearn page, but slightly modified:
import numpy as np
from sklearn.model_selection import LeaveOneOut
x = np.array([1,2])
y = np.array([3,4])
coords = np.column_stack((x,y))
z = np.array([8, 12])
loo = LeaveOneOut()
loo.get_n_splits(coords)
print(loo)
LeaveOneOut()
for train_index, test_index in loo.split(coords):
print("TRAIN:", train_index, "TEST:", test_index)
XY_train, XY_test = coords[train_index], coords[test_index]
z_train, z_test = z[train_index], z[test_index]
print(XY_train, XY_test, z_train, z_test)
Which returns:
TRAIN: [1] TEST: [0]
[[2 4]] [[1 3]] [12] [8]
TRAIN: [0] TEST: [1]
[[1 3]] [[2 4]] [8] [12]
In my case I'd like to write each split value to a dataframe like this:
X Y Ztrain Ztest split
0 1 2 8 12 0
1 3 4 8 12 0
2 1 2 12 8 1
3 3 4 12 8 1
And so on.
The motivation for doing this is I want to try a jackknifing interpolation of sparse point data. Ideally I want to run an interpolation/gridder on each of the LOO training sets, and then stack them. But I am struggling to access each train set to then use in something like griddata
Any help would be appreciated, for the problem here or the approach in general.
I don't quite get the logic of your dataframe, but you can try something like below to get your dataframe:
df = []
for train_index, test_index in loo.split(coords):
x = pd.DataFrame({'XY_train':coords[train_index][0],\
'XY_test':coords[test_index][0],\
'Ztrain':z[train_index][0],\
'Ztest':z[test_index][0]})
df.append(x)
df = pd.concat(df)
df
XY_train XY_test Ztrain Ztest
0 2 1 12 8
1 4 3 12 8
0 1 2 8 12
1 3 4 8 12
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.