Write sklearn LOO splits to pandas dataframe with index as label column

Question

I'm trying (badly) to use sklearn's LOO functionality and what I would like to do is append each training split set into a dataframe column with a label for the split index. So using the example from the sklearn page, but slightly modified:

import numpy as np
from sklearn.model_selection import LeaveOneOut

x = np.array([1,2])
y = np.array([3,4])
coords = np.column_stack((x,y))
z = np.array([8, 12])
loo = LeaveOneOut()
loo.get_n_splits(coords)

print(loo)
LeaveOneOut()
for train_index, test_index in loo.split(coords):
     print("TRAIN:", train_index, "TEST:", test_index)
     XY_train, XY_test = coords[train_index], coords[test_index]
     z_train, z_test = z[train_index], z[test_index]
     print(XY_train, XY_test, z_train, z_test)

Which returns:

TRAIN: [1] TEST: [0]
[[2 4]] [[1 3]] [12] [8]
TRAIN: [0] TEST: [1]
[[1 3]] [[2 4]] [8] [12]

In my case I'd like to write each split value to a dataframe like this:

     X    Y   Ztrain    Ztest    split
0    1    2   8         12       0
1    3    4   8         12       0
2    1    2   12        8        1
3    3    4   12        8        1

And so on.

The motivation for doing this is I want to try a jackknifing interpolation of sparse point data. Ideally I want to run an interpolation/gridder on each of the LOO training sets, and then stack them. But I am struggling to access each train set to then use in something like griddata

Any help would be appreciated, for the problem here or the approach in general.

Answer 1

I don't quite get the logic of your dataframe, but you can try something like below to get your dataframe:

df = []
for train_index, test_index in loo.split(coords):
    x = pd.DataFrame({'XY_train':coords[train_index][0],\
    'XY_test':coords[test_index][0],\
    'Ztrain':z[train_index][0],\
    'Ztest':z[test_index][0]})
    df.append(x)
df = pd.concat(df)
df

   XY_train  XY_test  Ztrain  Ztest
0         2        1      12      8
1         4        3      12      8
0         1        2       8     12
1         3        4       8     12

Write sklearn LOO splits to pandas dataframe with index as label column

Question

1 answers

solution1
0 2021-12-10 23:22:05

Write sklearn LOO splits to pandas dataframe with index as label column

Question

1 answers

solution1 0 2021-12-10 23:22:05

solution1
0 2021-12-10 23:22:05