简体   繁体   English

根据数组连接熊猫数据框

[英]Join Pandas Dataframes according to array

I am attempting to join several dataframes together. 我正在尝试将几个数据框连接在一起。 The list of names of these dataframes is stored in another dataframe called companies , which is displayed below. 这些dataframes的名称的列表存储在另一个名为数据帧companies ,这显示如下。

>>> companies
16:   Symbols
0    TUES
1    DRAM
2    NTRS
3    PCBK
4    CRIS
5    PERY
6    IRDM
7   GNCMA
8    IBOC

My aim would be to do something like this: joined=TUES.join(DRAM) then joined=joined.join(NTRS) and so on, down the list. 我的目标是做这样的事情: joined=TUES.join(DRAM)然后joined=joined.join(NTRS) ,依此类推,在列表中。 How might I be able to reference elements of the Symbols column of the dataframe companies in order to achieve this? 为了实现这一目标,我如何能够引用数据框架companies的“ Symbols列中的元素?

Many thanks in advance! 提前谢谢了!

You can define an empty DataFrame and append all other dataframes to it. 您可以定义一个空的DataFrame并将所有其他数据DataFrame附加到它。 See the example below: 请参阅以下示例:

combined_df = pandas.DataFrame()
for df in other_dataframes:
    combined_df = combined_df.append(df)

Use pd.concat , it is designed for merging lists of dfs: 使用pd.concat ,它用于合并dfs列表:

so for your example just turn the values into a list and then concat: 因此对于您的示例,只需将值转换为列表,然后进行合并:

joined = pd.concat(list(companies['Symbols']), axis=1)

Example: 例:

In [4]:

import pandas as pd   
import numpy as np
df = pd.DataFrame({'a':np.random.randn(5), 'b':np.random.randn(5)})
df1 = pd.DataFrame({'c':np.random.randn(5), 'd':np.random.randn(5)})
df2 = pd.DataFrame({'e':np.random.randn(5), 'f':np.random.randn(5)})
df_list=[df,df2,df1]
df_list

Out[4]:

[          a         b
 0  0.143116  1.205407
 1 -0.430869  1.429313
 2  0.059810  0.430131
 3  2.554849 -1.450640
 4 -1.127638  0.715323

 [5 rows x 2 columns],           e         f
 0  0.658967  1.150672
 1  0.813355 -0.252577
 2  0.885928  0.970844
 3  0.519375 -1.929081
 4 -0.217152  0.907535

 [5 rows x 2 columns],           c         d
 0 -1.375885  1.422697
 1 -0.870040  0.135527
 2 -0.696600  1.954966
 3  0.494035 -0.727816
 4 -0.367156 -0.216115

 [5 rows x 2 columns]]

In [8]:
# now concatenate the list of dfs, by column
pd.concat(df_list,axis=1)

Out[8]:

          a         b         e         f         c         d
0  0.143116  1.205407  0.658967  1.150672 -1.375885  1.422697
1 -0.430869  1.429313  0.813355 -0.252577 -0.870040  0.135527
2  0.059810  0.430131  0.885928  0.970844 -0.696600  1.954966
3  2.554849 -1.450640  0.519375 -1.929081  0.494035 -0.727816
4 -1.127638  0.715323 -0.217152  0.907535 -0.367156 -0.216115

[5 rows x 6 columns]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM