简体   繁体   English

将DataFrame附加到多索引DataFrame

[英]Append DataFrame to multi-indexed DataFrame

I have a DataFrame with three indices that looks like this: 我有一个带有三个索引的DataFrame ,如下所示:

                                               stat1             stat2
sample                        env  run                                                  
sample1                       0    0          36.214             71
                                   1          31.808             71
                                   2          28.376             71
                                   3          20.585             71
sample2                       0    0           2.059             29
                                   1           2.070             29
                                   2           2.038             29

This represents a process that runs on different samples of data. 这表示在不同的数据样本上运行的过程。 This process is run multiples times in different environments, which qualifies the results. 此过程在不同环境中运行多次,从而对结果进行限定。

It may sound simple, but I am having trouble trying to add a new environment result as a DataFrame : 这可能听起来很简单,但我在尝试将新环境结果添加为DataFrame

            stat1          stat2
run                                                  
0           0.686             29
1           0.660             29
2           0.663             29

This should be indexed under df.loc[["sample1", 1]] . 这应该在df.loc[["sample1", 1]]下编入索引。 I have tried this: 我试过这个:

df.loc[["sample1", 1]] = result

And using DataFrame.append . 并使用DataFrame.append But the first just raises a KeyError and the second does not seem to modify the DataFrame at all. 但第一个只是引发了一个KeyError ,而第二个似乎根本没有修改DataFrame

What am I missing here? 我在这里错过了什么?

Edit: adding that when using append like df.loc["sample"].append(result) the problem is that it messes up the multi-index. 编辑:添加,当使用appenddf.loc["sample"].append(result)该问题是,它弄乱多指数。 It is transformed into a single index where the former multi-index is merged into a tuples, like (0, 0) or (0, 1) standing for environment 0, run 1, and so on; 它被转换为单个索引,其中前一个多索引被合并为一个元组,如(0, 0)(0, 1)代表环境0,运行1,依此类推; and the index of the appended DataFrame (a ranged index representing each run) becomes the new unwanted index. 并且附加的DataFrame的索引(表示每次运行的范围索引)成为新的不需要的索引。

The core of the issue here is the difference in the indexes. 这里问题的核心是索引的差异。 One way to overcome this would be to change result's index to include the 0,1 levels to be set, then use concat to append the datataframe. 克服这个问题的一种方法是更改​​结果的索引以包含要设置的0,1级别,然后使用concat附加数据帧。 see the example below 见下面的例子

In [68]: result.index = list(zip(["sample1"]*len(result), [1]*len(result),result
    ...: .index))

In [69]: df = pd.concat([df,result])
         df
Out[69]: 
                  stat1  stat2
sample  env run               
sample1 0   0    36.214     71
            1    31.808     71
            2    28.376     71
            3    20.585     71
sample2 0   0     2.059     29
            1     2.070     29
            2     2.038     29
sample1 1   0     0.686     29
            1     0.660     29
            2     0.663     29

Edit: Once the index is change, you can even use append 编辑:索引更改后,您甚至可以使用追加

In [21]: result.index = list(zip(["sample1"]*len(result), [1]*len(result),result
    ...: .index))

In [22]: df.append(result)
Out[22]: 
                  stat1  stat2
sample  env run               
sample1 0   0    36.214     71
            1    31.808     71
            2    28.376     71
            3    20.585     71
sample2 0   0     2.059     29
            1     2.070     29
            2     2.038     29
sample1 1   0     0.686     29
            1     0.660     29
            2     0.663     29

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM