简体   繁体   中英

np.array_split gives ValueError: cannot copy sequence with size -

I'm currently working with time series data and was trying to use the np.array_split to split my data into batches.

To give a bit more detail, I originally had a DataFrame that's of size (32000, 149) . I handled this data so that I could create a sliding window of size (100, 40) over the data and use that in an autoregressive fashion. Each "minibatch" has 32 instances, each of size (100, 40) which leads me to a total of 1,000 of these so-called minibatches.

What I want to do is divide these 1,000 batches into the "number of days," which in this case would be 10, meaning that each separate "day-batch" would contain 100 minibatches, respectively.

The code that I have is the following:

data = {}
batches = []
minibatches = []
count = 1

for idx, _ in df.iterrows():
    minibatches.append(df.iloc[idx:(idx + 100), 0:40])
    if count == 32:
        batches.append(minibatches)
        minibatches = []
        count = 0
    count += 1

num_days = len(batches) // 100 # 1000 // 100 = 10
batch_split = np.array_split(batches, num_days)

for day in range(num_days):
    data[fold][day] = batch_split[day] # Nevermind the "fold" variable here.

The error that I'm getting for the np.array_split line is:

ValueError: cannot copy sequence with size 100 to array axis with dimension 40

If I were to break down the batches variable, it's simply a list containing 1,000 lists. These 1,000 lists are the aforementioned minibatches, and are also lists containing 32 DataFrames each. Each DataFrame of the 32 DataFrames is of size (100, 40) .

I don't understand where the size 100 and dimension 40 is coming from in this setting. I'm not accessing the DataFrame values, and simply want to split the list batches .

Try making sure the collection you are passing into the function is a numpy array.

batch_split = np.array_split(np.array(batches), num_days)

My best guess at an explanation is since you are collecting everything in a list which has no attribute 'shape', it's finding the 'shape' attribute on the dataframes.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM