Pandas：根據公共列名稱將多個數據幀中的列提取到新的數據幀

Question

我有從Excel導入的4個數據集，其中包含2013,2014,2015和2016年學校的total_budget。所有數據集都有一個公共列，其中包含每個學校的ID代碼（列LAESTAB）。

我想要一個新的數據集，其左邊是公共列LAESTAB（4個數據集中的值相同），接下來是total2013，total2014，total2015和total2016（來自不同的數據集）列。

我還希望擺脫其余的數據，包括所有數據集中都沒有的學校ID。

我將嘗試在一個例子中進一步闡述它：

以下是其中一個Excel數據集的示例：

>>> print cuts2016.head()

    LA_codelocal_authority_name    UPIN  URN  LAESTAB  \
0      201       City of London  500000  0.0  2013614   
1      202               Camden  500005  0.0  2022095   
2      202               Camden  500007  0.0  2022219   
3      202               Camden  500012  0.0  2022502   
4      202               Camden  500014  0.0  2022603   

       School Name Academy?    Phase Provider Type  \
0  Sir John Cass's Foundation Primary School       No  Primary        School   
1                     Carlton Primary School       No  Primary        School   
2                       Fleet Primary School       No  Primary        School   
3                        Rhyl Primary School       No  Primary        School   
4                    Torriano Primary School       No  Primary        School   


   MFG protection (+ve) or capping/scaling (-ve)  total2016  \
0                                          35000    1659000   
1                                          68000    1956000   
2                                         -10000    1059000   
3                                          97000    2234000   
4                                              0    2284000

2005年的另一個Excel數據集：

print cuts2015.head()
   LA_code local_authority_name  UPIN     URN  LAESTAB  \
0      201       City of London   NaN  100000  2013614   
1      202               Camden   NaN  100008  2022019   
2      202               Camden   NaN  100009  2022036   
3      202               Camden   NaN  100010  2022065   
4      202               Camden   NaN  100011  2022078   

                                 school_name    Phase Provider Type  \
0  Sir John Cass's Foundation Primary School  Primary        School   
1                      Argyle Primary School  Primary        School   
2                    Beckford Primary School  Primary        School   
3                   Brecknock Primary School  Primary        School   
4                  Brookfield Primary School  Primary        School   

  Basic Entitlement Total Funding Deprivation Total Funding total_pre_MFG  \
0                       1,206,000                   215,000     1,644,000   
1                       1,333,000                   367,000     2,068,000   
2                       1,482,000                   359,000     2,221,000   
3                       1,234,000                   348,000     1,974,000   
4                       1,436,000                   256,000     2,028,000   

  MFG protection (+ve) or capping/scaling (-ve) total2015  \
0                                             0   1644000   
1                                        25,000   2093000   
2                                             0   2221000   
3                                        72,000   2046000   
4                                       -58,000   1970000

我需要的最終結果如下（應顯示總計2014和總計2013）：

LAESTAB  total2016    total2015   etc...\
2013614  1956000      1644000      
2022019  1059000      2093000 
2022036  2234000      2221000 
2022065  2284000      1970000 
...

我已經嘗試了'reduce'，如下所示，但它返回0行×66列。

dataframe_list = [cuts2013, cuts2014, cuts2015, cuts2016]
df_final = reduce(lambda left,right: pd.merge(left,right,on='LAESTAB'), dataframe_list)

Answer 1

使用LAESTAB列合並數據框SQL樣式，然后根據需要從data_merged刪除列。

import pandas as pd
data_merged = pd.merge(cuts2016,cuts2015,on = "LAESTAB")

有關合並的更多信息，請查看以下鏈接：

http://chrisalbon.com/python/pandas_join_merge_dataframe.html

http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.merge.html

Answer 2

一種方法是使用Mainul Islam指出的合並。 在這里，您必須執行3次合並操作才能合並4個數據幀。 否則，您可以連接所有4個數據幀並執行groupby操作。

dataframe_list = [cuts2013, cuts2014, cuts2015, cuts2016]
total = pd.concat(dataframe_list)
total = total.groupby('LAESTAB')['total2013', 'total2014', 'total2015','total2016'].sum().reset_index()

Pandas：根據公共列名稱將多個數據幀中的列提取到新的數據幀

問題描述

2 個解決方案

解決方案1
0 2017-02-27 01:23:07

解決方案2
0 已采納 2017-02-27 06:41:39

Pandas：根據公共列名稱將多個數據幀中的列提取到新的數據幀

問題描述

2 個解決方案

解決方案1 0 2017-02-27 01:23:07

解決方案2 0 已采納 2017-02-27 06:41:39

解決方案1
0 2017-02-27 01:23:07

解決方案2
0 已采納 2017-02-27 06:41:39