iterate through slices of a numpy array

Question

I have a pandas dataframe eg

df = pd.DataFrame({'dim1': ['a', 'a', 'b', 'b'], 'dim2': ['x', 'y', 'x', 'y'], 'val': [2, 4, 6, 8]})

This can represent an array of N dimensions, I have chosen two here for simplicity. I will convert this to a numpy array and then want to iterate and sum over this numpy array for each 'slice' of the array. I have achieved this but I do not know how to generalise for N dimensions.

Function to convert to df -> numpy.array

def df_to_numpy(df: pd.DataFrame) -> np.array:
    # Function to convert to np.array
    try:
        shape = [len(level) for level in df.index.levels]
    except AttributeError:
        shape = [len(df.index)]
    ncol = df.shape[-1]
    if ncol > 1:
        shape.append(ncol)
    return df.to_numpy().reshape(shape)

Now to convert use this and unstack(). (Not generalise yet due to column names but easy enough to at later point)

arr = df_to_numpy(df.set_index(['dim1', 'dim2']).unstack())

Now use a loop and swapaxes() to iterate through the slices of this array

for _ in range(len(arr.shape)):
    # we now iterate through the unique groupings of this dimension
    for ii in range(arr.shape[0]):
        print('Unq grouping no.: ',ii)
        print('Sum: ', arr[ii,:].sum())
    # swap the last and first axes and repeat step
    arr = arr.swapaxes(0,len(arr.shape) - 1)

This appears to work for my example and some higher dimension ones I've tried. However this is not generalisable. eg for 4 dimensions the sum would be arr[ii,:,:,:], how can I generalise this line to work for n dimensions?

Answer 1

looking to your data frame it feels like your headers are misleading. the dimesion is x and y. In this case you have unorganized data set. So if you want to have 4 dimensional you can still keep structure of your data frame just have extra 2 rows for each a and b. like now you have a: x, y. Then you can have a: x, y, z, k.

df = pd.DataFrame({'dim1': ['a', 'a', 'a', 'a', 'b', 'b', 'b', 'b'], 'dim2': ['x', 'y', 'z', 'k', 'x', 'y', 'z', 'k'], 'val': [2, 4, 6, 8, 3, 2, 1, 5]})

def df_to_numpy(df: pd.DataFrame) -> np.array:
    # Function to convert to np.array
    try:
        shape = [len(level) for level in df.index.levels]
    except AttributeError:
        shape = [len(df.index)]
    ncol = df.shape[-1]
    if ncol > 1:
        shape.append(ncol)
    return df.to_numpy().reshape(shape)

arr = df_to_numpy(df.set_index(['dim1', 'dim2']).unstack())
arr
for _ in range(len(arr.shape)):
    # we now iterate through the unique groupings of this dimension
    for ii in range(arr.shape[0]):
        print('Unq grouping no.: ',ii)
        print('Sum: ', arr[ii,:].sum())
    # swap the last and first axes and repeat step
    arr = arr.swapaxes(0,len(arr.shape) - 1)

iterate through slices of a numpy array

Question

1 answers

solution1
0 2022-11-21 10:22:01

iterate through slices of a numpy array

Question

1 answers

solution1 0 2022-11-21 10:22:01

solution1
0 2022-11-21 10:22:01