I have a pandas dataframe eg
df = pd.DataFrame({'dim1': ['a', 'a', 'b', 'b'], 'dim2': ['x', 'y', 'x', 'y'], 'val': [2, 4, 6, 8]})
This can represent an array of N dimensions, I have chosen two here for simplicity. I will convert this to a numpy array and then want to iterate and sum over this numpy array for each 'slice' of the array. I have achieved this but I do not know how to generalise for N dimensions.
Function to convert to df -> numpy.array
def df_to_numpy(df: pd.DataFrame) -> np.array:
# Function to convert to np.array
try:
shape = [len(level) for level in df.index.levels]
except AttributeError:
shape = [len(df.index)]
ncol = df.shape[-1]
if ncol > 1:
shape.append(ncol)
return df.to_numpy().reshape(shape)
Now to convert use this and unstack(). (Not generalise yet due to column names but easy enough to at later point)
arr = df_to_numpy(df.set_index(['dim1', 'dim2']).unstack())
Now use a loop and swapaxes() to iterate through the slices of this array
for _ in range(len(arr.shape)):
# we now iterate through the unique groupings of this dimension
for ii in range(arr.shape[0]):
print('Unq grouping no.: ',ii)
print('Sum: ', arr[ii,:].sum())
# swap the last and first axes and repeat step
arr = arr.swapaxes(0,len(arr.shape) - 1)
This appears to work for my example and some higher dimension ones I've tried. However this is not generalisable. eg for 4 dimensions the sum would be arr[ii,:,:,:], how can I generalise this line to work for n dimensions?
looking to your data frame it feels like your headers are misleading. the dimesion is x and y. In this case you have unorganized data set. So if you want to have 4 dimensional you can still keep structure of your data frame just have extra 2 rows for each a and b. like now you have a: x, y. Then you can have a: x, y, z, k.
df = pd.DataFrame({'dim1': ['a', 'a', 'a', 'a', 'b', 'b', 'b', 'b'], 'dim2': ['x', 'y', 'z', 'k', 'x', 'y', 'z', 'k'], 'val': [2, 4, 6, 8, 3, 2, 1, 5]})
def df_to_numpy(df: pd.DataFrame) -> np.array:
# Function to convert to np.array
try:
shape = [len(level) for level in df.index.levels]
except AttributeError:
shape = [len(df.index)]
ncol = df.shape[-1]
if ncol > 1:
shape.append(ncol)
return df.to_numpy().reshape(shape)
arr = df_to_numpy(df.set_index(['dim1', 'dim2']).unstack())
arr
for _ in range(len(arr.shape)):
# we now iterate through the unique groupings of this dimension
for ii in range(arr.shape[0]):
print('Unq grouping no.: ',ii)
print('Sum: ', arr[ii,:].sum())
# swap the last and first axes and repeat step
arr = arr.swapaxes(0,len(arr.shape) - 1)
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.