如何将 pandas 数据帧与自动键连接起来？

Question

Following on an earlier question继较早的问题

I have我有

df1 = pd.Dataframe(
    [
    {'a': 1},
    {'a': 2},
    {'a': 3},
    ]
)

df2 = pd.Dataframe(
    [
    {'a': 4},
    {'a': 5},
    ]
)

And I want而且我要

I accepted an answer too soon, that told me to do我太早接受了一个答案，告诉我要做

pd.concat([df1, df2], keys=[1,2])

which gives the correct result, but [1,2] is hardcoded.这给出了正确的结果，但 [1,2] 是硬编码的。

I also want this to be incremental, meaning given我也希望这是增量的，意思是给定的

df3 DF3

and和

df4 = pd.Dataframe(
    [
    {'a': 6},
    {'a': 7},
    ]
)

I want the concatenation to give我想要连接给

Using the same function.使用相同的 function。

How can I achieve this correctly?我怎样才能正确地做到这一点？

EDIT : A discount- I can manage with only the incrementing function. It doesn't have to work with the single level dfs, but it would be nice if it did.编辑：折扣 - 我只能使用递增的 function 进行管理。它不必与单级 dfs 一起使用，但如果可以的话会很好。

Answer 1

IIUC, IIUC,

def split_list_by_multitindex(l):

    l_multi, l_not_multi = [], []
    for df in l:
        if isinstance(df.index, pd.MultiIndex):
            l_multi.append(df)
        else:
            l_not_multi.append(df)
    
    return l_multi, l_not_multi

def get_start_key(df):
    return df.index.get_level_values(0)[-1]

def concat_starting_by_key(l, key):
    return pd.concat(l, keys=range(key, key+len(l))) \
        if len(l) > 1 else set_multiindex_in_df(l[0], key)

def set_multiindex_in_df(df, key):
    return df.set_axis(pd.MultiIndex.from_product(([key], df.index)))


def myconcat(l):
    l_multi, l_not_multi = split_list_by_multitindex(l)
    return pd.concat([*l_multi, 
                      concat_starting_by_key(l_not_multi, 
                                              get_start_key(l_multi[-1]) + 1)
                     ]) if l_multi else concat_starting_by_key(l_not_multi, 1)

Examples例子

l1 = [df1, df2]

print(myconcat(l1))

     a
1 0  1
  1  2
  2  3
2 0  4
  1  5

l2 = [myconcat(l1), df4]

print(myconcat(l2))

     a
1 0  1
  1  2
  2  3
2 0  4
  1  5
3 0  6
  1  7

myconcat([df4, myconcat([df1, df2]), df1, df2])

     a
1 0  1
  1  2
  2  3
2 0  4
  1  5
3 0  6
  1  7
4 0  1
  1  2
  2  3
5 0  4
  1  5

Note笔记

This assumes that if we make a concatenation of the dataframes belonging to the l_multi list , the resulting dataframe would already be ordered这假设如果我们连接属于l_multi list的数据帧，则结果 dataframe 已经被排序

Answer 2

My approach was to nest two pd.concat functions, the second one to create a MultiIndex dataframe, from a single index.我的方法是嵌套两个pd.concat函数，第二个函数从单个索引创建MultiIndex dataframe。

import pandas as pd

df = pd.DataFrame(
    [
    {'a': 1},
    {'a': 2},
    {'a': 3},
    ]
)

df2 = pd.DataFrame(
    [
    {'a': 4},
    {'a': 5},
    ]
)

df = pd.concat([df, df2], keys=df.index.get_level_values(0))
In[2]: df
Out[2]:
     a
0 0  1
  1  2
  2  3
1 0  4
  1  5

And to merge a new dataframe:并合并一个新的 dataframe：

df3 = pd.DataFrame(
    [
    {'a': 6},
    {'a': 7},
    ]
)

In[3]: pd.concat([df, pd.concat([df3,], keys=(max(df.index.get_level_values(0))+1,))])
Out[3]: 
     a
0 0  1
  1  2
  2  3
1 0  4
  1  5
2 0  6
  1  7

EDIT : Following the comment from ansev saying that this method was inefficent, ran some simple test.编辑：根据 ansev 的评论说这种方法效率低下，进行了一些简单的测试。 This is the output:这是 output：

In[5]: %timeit pd.concat([df, pd.concat([df3,], keys=(max(df.index.get_level_values(0))+1,))])
Out[5]: 1.99 ms ± 98.9 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

Comparing to his method:对比他的方法：

In[6]: %timeit [myconcat(l1), df3]
Out[6]: 1.92 ms ± 96.9 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

Answer 3

This is how I solved it我就是这样解决的

import pandas as pd

df1 = pd.DataFrame(
    [
    {'a': 1},
    {'a': 2},
    {'a': 3},
    ]
)

df2 = pd.DataFrame(
    [
    {'a': 4},
    {'a': 5},
    ]
)

df = df1.append(df2)

df['from'] = df.index == 0
df['from'] = df['from'].cumsum()
df = df[['from', 'a']]

print(df)

如何将 pandas 数据帧与自动键连接起来？

问题描述

3 个解决方案

解决方案1
1 已采纳 2020-10-25 18:41:26

解决方案2
1 2020-10-26 00:01:18

解决方案3
0 2020-10-25 18:39:02

如何将 pandas 数据帧与自动键连接起来？

问题描述

3 个解决方案

解决方案1 1 已采纳 2020-10-25 18:41:26

解决方案2 1 2020-10-26 00:01:18

解决方案3 0 2020-10-25 18:39:02

解决方案1
1 已采纳 2020-10-25 18:41:26

解决方案2
1 2020-10-26 00:01:18

解决方案3
0 2020-10-25 18:39:02