I'm currently working with time series data and was trying to use the np.array_split
to split my data into batches.
To give a bit more detail, I originally had a DataFrame that's of size (32000, 149)
. I handled this data so that I could create a sliding window of size (100, 40)
over the data and use that in an autoregressive fashion. Each "minibatch" has 32 instances, each of size (100, 40)
which leads me to a total of 1,000 of these so-called minibatches.
What I want to do is divide these 1,000 batches into the "number of days," which in this case would be 10, meaning that each separate "day-batch" would contain 100 minibatches, respectively.
The code that I have is the following:
data = {}
batches = []
minibatches = []
count = 1
for idx, _ in df.iterrows():
minibatches.append(df.iloc[idx:(idx + 100), 0:40])
if count == 32:
batches.append(minibatches)
minibatches = []
count = 0
count += 1
num_days = len(batches) // 100 # 1000 // 100 = 10
batch_split = np.array_split(batches, num_days)
for day in range(num_days):
data[fold][day] = batch_split[day] # Nevermind the "fold" variable here.
The error that I'm getting for the np.array_split
line is:
ValueError: cannot copy sequence with size 100 to array axis with dimension 40
If I were to break down the batches
variable, it's simply a list containing 1,000 lists. These 1,000 lists are the aforementioned minibatches, and are also lists containing 32 DataFrames each. Each DataFrame of the 32 DataFrames is of size (100, 40)
.
I don't understand where the size 100
and dimension 40
is coming from in this setting. I'm not accessing the DataFrame values, and simply want to split the list batches
.
Try making sure the collection you are passing into the function is a numpy array.
batch_split = np.array_split(np.array(batches), num_days)
My best guess at an explanation is since you are collecting everything in a list which has no attribute 'shape', it's finding the 'shape' attribute on the dataframes.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.