连接多个 Pandas DataFrames

Question

I have a large number of DataFrames with similar prefix df_ , that look like:我有大量具有类似前缀df_ ，它们看起来像：

df_1
df_x
df_ab
.
.
.
df_1a
df_2b

Of course I can do final_df = pd.concat([df_1, df_x, df_ab, ... df_1a, df_2b], axis = 1)当然我可以做final_df = pd.concat([df_1, df_x, df_ab, ... df_1a, df_2b], axis = 1)

The issue is that although the prefix df_ will always be there, the rest of the dataframes' names keep changing and do not have any pattern.问题是，尽管前缀df_将始终存在，但其余数据df_的名称不断变化并且没有任何模式。 So, I have to constantly update the list of dataframes in pd.concat to create the 'final_df`, which is cumbersome.因此，我必须不断更新pd.concat中的数据帧列表以创建“final_df”，这很麻烦。

Question : is there anyway to tell python to concatenate all defined dataframes in the namespace (only) starting with df_ and create the final_df or at least return a list of all such dataframes that I can then manually feed into pd.concat ?问题：无论如何要告诉 python 连接命名空间中所有已定义的数据帧（仅）以df_并创建final_df或至少返回所有此类数据帧的列表，然后我可以手动将其输入pd.concat ？

Answer 1

You could do something like this, using the built-in function globals() :您可以使用内置函数globals() ：

def concat_all(prefix='df_'):
    dfs = [df for name, df in globals().items() if name.startswith(prefix)
           and isinstance(df, pd.DataFrame)]
    return pd.concat(dfs, axis=1)

Logic:逻辑：

Filter down your global namespace to DataFrames that start with prefix将全局命名空间过滤为以prefix开头的 DataFrame
Put these in a list (concat doesn't take a generator)把这些放在一个列表中（concat 不带生成器）
Call concat() on the first axis.在第一个轴上调用concat() 。

Example:例子：

import pandas as pd

df_1 = pd.DataFrame([[0, 1], [2, 3]])
df_2 = pd.DataFrame([[4, 5], [6, 7]])
other_df = df_1.copy() * 2  # ignore this
s_1 = pd.Series([1, 2, 3, 4])  # and this

final_df = concat_all()
final_df

   0  1  0  1
0  0  1  4  5
1  2  3  6  7

Always use globals() with caution.始终谨慎使用globals() 。 It gets you a dictionary of the entire module namespace.它为您提供整个模块命名空间的字典。

You need globals() rather than locals() because the dictionary is being used inside a function.您需要globals()而不是locals()因为字典是在函数内部使用的。 locals() would be null here at time of use.在使用时， locals()在这里将为空。

连接多个 Pandas DataFrames

问题描述

1 个解决方案

解决方案1
1 已采纳 2018-02-26 00:29:36

连接多个 Pandas DataFrames

问题描述

1 个解决方案

解决方案1 1 已采纳 2018-02-26 00:29:36

解决方案1
1 已采纳 2018-02-26 00:29:36