[英]Concatenating multiple pandas DataFrames
I have a large number of DataFrames with similar prefix df_
, that look like:我有大量具有类似前缀
df_
,它们看起来像:
df_1
df_x
df_ab
.
.
.
df_1a
df_2b
Of course I can do final_df = pd.concat([df_1, df_x, df_ab, ... df_1a, df_2b], axis = 1)
当然我可以做
final_df = pd.concat([df_1, df_x, df_ab, ... df_1a, df_2b], axis = 1)
The issue is that although the prefix df_
will always be there, the rest of the dataframes' names keep changing and do not have any pattern.问题是,尽管前缀
df_
将始终存在,但其余数据df_
的名称不断变化并且没有任何模式。 So, I have to constantly update the list of dataframes in pd.concat
to create the 'final_df`, which is cumbersome.因此,我必须不断更新
pd.concat
中的数据帧列表以创建“final_df”,这很麻烦。
Question : is there anyway to tell python to concatenate all defined dataframes in the namespace (only) starting with df_
and create the final_df
or at least return a list of all such dataframes that I can then manually feed into pd.concat
?问题:无论如何要告诉 python 连接命名空间中所有已定义的数据帧(仅)以
df_
并创建final_df
或至少返回所有此类数据帧的列表,然后我可以手动将其输入pd.concat
?
You could do something like this, using the built-in function globals()
:您可以使用内置函数
globals()
:
def concat_all(prefix='df_'):
dfs = [df for name, df in globals().items() if name.startswith(prefix)
and isinstance(df, pd.DataFrame)]
return pd.concat(dfs, axis=1)
Logic:逻辑:
prefix
prefix
开头的 DataFrameconcat()
on the first axis.concat()
。 Example:例子:
import pandas as pd
df_1 = pd.DataFrame([[0, 1], [2, 3]])
df_2 = pd.DataFrame([[4, 5], [6, 7]])
other_df = df_1.copy() * 2 # ignore this
s_1 = pd.Series([1, 2, 3, 4]) # and this
final_df = concat_all()
final_df
0 1 0 1
0 0 1 4 5
1 2 3 6 7
Always use globals()
with caution.始终谨慎使用
globals()
。 It gets you a dictionary of the entire module namespace.它为您提供整个模块命名空间的字典。
You need globals()
rather than locals()
because the dictionary is being used inside a function.您需要
globals()
而不是locals()
因为字典是在函数内部使用的。 locals()
would be null here at time of use.在使用时,
locals()
在这里将为空。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.