简体   繁体   English

在Array Appending方面遇到问题

[英]Having trouble with Array Appending

I feel kinda silly being stuck on something that seems so simple for so long but since Im about to put my head through the wall, I figured I'd ask for some help. 我觉得有点傻到被困在长期看似如此简单的事情上但是因为我要把头伸进墙里,我想我会请求一些帮助。

I have a loop that splits my data into smaller subsets and loops through each one. 我有一个循环,将我的数据分成更小的子集并循环遍历每个子集。 For each loop, it produces a y_test and a y_pred array. 对于每个循环,它会生成y_testy_pred数组。 It'll be variable size but the shape is (X,). 它的大小可变,但形状是(X,)。 In order to plot the the two arrays vs each other, i just assigned the arrays to an empty dataframe and just used matplotlib to plot. 为了绘制两个数组相互映射,我只是将数组分配给一个空数据帧,并使用matplotlib进行绘图。

Now, i'd like to just be able to also keep a running total of the y_pred and y_test so I can see a plot of the entire data set. 现在,我想能够保持y_predy_test的运行总计,这样我就可以看到整个数据集的图。

What I've tried: 我尝试过的:

Initially, I just tried creating another empty data frame outside my loop and thought to just append the arrays to the end of my columns of the dataframe but i found appending arrays to dataframe was not possible. 最初,我只是尝试在我的循环外创建另一个空数据框,并认为只是将数组附加到我的数据帧列的末尾,但我发现将数组附加到数据框是不可能的。

Then i thought I'll just append to an empty array for each time through the loop and convert to a dataframe at the end to plot but Im not having much luck there either and if I understand correctly - np.append is creating a new array of the appended data every time I append? 然后我想我只是通过循环每次追加到一个空数组并转换到最后的数据帧以绘制,但我也没有太多运气,如果我理解正确 - np.append正在创建一个新的数组我追加的附加数据? Wasn't sure if this would get memory intensive. 不确定这是否会导致内存密集。

I was wondering what is the best way to do this? 我想知道最好的方法是什么?

Here is my code (I tried to remove a lot of the lines that weren't necessary to the problem to make it easier to follow): 这是我的代码(我试图删除许多不是问题所必需的行,以便更容易遵循):

continuous_results = pd.Dataframe()
        tscv = TimeSeriesSplit(n_splits=self.no_splits)
        for train_index, test_index in tqdm(tscv.split(X)):
            X_train, X_test = X.iloc[train_index], X.iloc[test_index]
            y_train, y_test = y.iloc[train_index], y.iloc[test_index]



            self.regressor.fit(X_train, y_train.ravel())

            # predict y values
            y_pred = self.regressor.predict(X_test)


            # plot y_pred vs y_test
            y_df = pd.DataFrame()
            y_pred = y_pred.reshape(len(y_pred), )
            y_test = y_test.reshape(len(y_test), )
            y_df['y_pred'] = y_pred
            y_df['y_test'] = y_test
# failed attempts at continuous dataframe
            continuous_results = continuous_results['Model'].append(y_pred[:,:])
            continuous_results = continuous_results['Actual'].append(y_test)


            y_df.plot()

It is possible to create DataFrames from numpy arrays and vice versa: 可以从numpy数组创建DataFrames,反之亦然:

# If you already have data as an array
data = np.random.random((10,5))
# Create a dataframe from a numpy array
df = pd.DataFrame(data)
# Create a numpy array from a dataframe
as_array = df.to_numpy()

If you want/have to loop, you can do this with both numpy arrays and DataFrames. 如果你想/必须循环,你可以使用numpy数组和DataFrames。 It is more efficient to construct a numpy from a list than to concatenate arrays in a loop: 从列表构造numpy比在循环中连接数组更有效:

# Looping - arrays can handle n dimensions
data = []
for i in range(10):
    row = np.random.random((1,1,1,1,1))
    # Add a second dimension
    row = row[:,np.newaxis]
    # Remove the second dimension
    row = row[:,-1]
    # A list can hold anything
    data.append(row)
# Construct an array from a list of arrays
array = np.array(data)

DataFrames can do this as well, but a dataframe row can only have one dimension. DataFrames也可以这样做,但数据帧行只能有一个维度。

# looping - dataframes can work with only one dimension per row
data = []
for i in range(10):
    data.append(np.random.random(5))
# Construct a DataFrame from a list of values
df = pd.DataFrame(data)

In order to append to an existing DataFrame, a Series or a DataFrame needs to be created from the data first. 要附加到现有DataFrame,需要首先从数据创建Series或DataFrame。

df = pd.DataFrame()
for i in range(10):
    n = np.random.random(1)
    # To append to a DataFrame, first create a Series (a row or a column) or a DataFrame
    row = pd.Series(n, name=i)
    # append a Series (or a DataFrame) to the "bottom" of another DataFrame
    df = df.append(row)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM