简体   繁体   中英

iterate through slices of a numpy array

I have a pandas dataframe eg

df = pd.DataFrame({'dim1': ['a', 'a', 'b', 'b'], 'dim2': ['x', 'y', 'x', 'y'], 'val': [2, 4, 6, 8]})

This can represent an array of N dimensions, I have chosen two here for simplicity. I will convert this to a numpy array and then want to iterate and sum over this numpy array for each 'slice' of the array. I have achieved this but I do not know how to generalise for N dimensions.

Function to convert to df -> numpy.array

def df_to_numpy(df: pd.DataFrame) -> np.array:
    # Function to convert to np.array
    try:
        shape = [len(level) for level in df.index.levels]
    except AttributeError:
        shape = [len(df.index)]
    ncol = df.shape[-1]
    if ncol > 1:
        shape.append(ncol)
    return df.to_numpy().reshape(shape)

Now to convert use this and unstack(). (Not generalise yet due to column names but easy enough to at later point)

arr = df_to_numpy(df.set_index(['dim1', 'dim2']).unstack())

Now use a loop and swapaxes() to iterate through the slices of this array

for _ in range(len(arr.shape)):
    # we now iterate through the unique groupings of this dimension
    for ii in range(arr.shape[0]):
        print('Unq grouping no.: ',ii)
        print('Sum: ', arr[ii,:].sum())
    # swap the last and first axes and repeat step
    arr = arr.swapaxes(0,len(arr.shape) - 1)

This appears to work for my example and some higher dimension ones I've tried. However this is not generalisable. eg for 4 dimensions the sum would be arr[ii,:,:,:], how can I generalise this line to work for n dimensions?

looking to your data frame it feels like your headers are misleading. the dimesion is x and y. In this case you have unorganized data set. So if you want to have 4 dimensional you can still keep structure of your data frame just have extra 2 rows for each a and b. like now you have a: x, y. Then you can have a: x, y, z, k.

df = pd.DataFrame({'dim1': ['a', 'a', 'a', 'a', 'b', 'b', 'b', 'b'], 'dim2': ['x', 'y', 'z', 'k', 'x', 'y', 'z', 'k'], 'val': [2, 4, 6, 8, 3, 2, 1, 5]})

def df_to_numpy(df: pd.DataFrame) -> np.array:
    # Function to convert to np.array
    try:
        shape = [len(level) for level in df.index.levels]
    except AttributeError:
        shape = [len(df.index)]
    ncol = df.shape[-1]
    if ncol > 1:
        shape.append(ncol)
    return df.to_numpy().reshape(shape)

arr = df_to_numpy(df.set_index(['dim1', 'dim2']).unstack())
arr
for _ in range(len(arr.shape)):
    # we now iterate through the unique groupings of this dimension
    for ii in range(arr.shape[0]):
        print('Unq grouping no.: ',ii)
        print('Sum: ', arr[ii,:].sum())
    # swap the last and first axes and repeat step
    arr = arr.swapaxes(0,len(arr.shape) - 1)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM