[英]Pandas: extract columns from multiple dataframes to a new dataframe based on common column name
我有從Excel導入的4個數據集,其中包含2013,2014,2015和2016年學校的total_budget。所有數據集都有一個公共列,其中包含每個學校的ID代碼(列LAESTAB)。
我想要一個新的數據集,其左邊是公共列LAESTAB(4個數據集中的值相同),接下來是total2013,total2014,total2015和total2016(來自不同的數據集)列。
我還希望擺脫其余的數據,包括所有數據集中都沒有的學校ID。
我將嘗試在一個例子中進一步闡述它:
以下是其中一個Excel數據集的示例:
>>> print cuts2016.head()
LA_codelocal_authority_name UPIN URN LAESTAB \
0 201 City of London 500000 0.0 2013614
1 202 Camden 500005 0.0 2022095
2 202 Camden 500007 0.0 2022219
3 202 Camden 500012 0.0 2022502
4 202 Camden 500014 0.0 2022603
School Name Academy? Phase Provider Type \
0 Sir John Cass's Foundation Primary School No Primary School
1 Carlton Primary School No Primary School
2 Fleet Primary School No Primary School
3 Rhyl Primary School No Primary School
4 Torriano Primary School No Primary School
MFG protection (+ve) or capping/scaling (-ve) total2016 \
0 35000 1659000
1 68000 1956000
2 -10000 1059000
3 97000 2234000
4 0 2284000
2005年的另一個Excel數據集:
print cuts2015.head()
LA_code local_authority_name UPIN URN LAESTAB \
0 201 City of London NaN 100000 2013614
1 202 Camden NaN 100008 2022019
2 202 Camden NaN 100009 2022036
3 202 Camden NaN 100010 2022065
4 202 Camden NaN 100011 2022078
school_name Phase Provider Type \
0 Sir John Cass's Foundation Primary School Primary School
1 Argyle Primary School Primary School
2 Beckford Primary School Primary School
3 Brecknock Primary School Primary School
4 Brookfield Primary School Primary School
Basic Entitlement Total Funding Deprivation Total Funding total_pre_MFG \
0 1,206,000 215,000 1,644,000
1 1,333,000 367,000 2,068,000
2 1,482,000 359,000 2,221,000
3 1,234,000 348,000 1,974,000
4 1,436,000 256,000 2,028,000
MFG protection (+ve) or capping/scaling (-ve) total2015 \
0 0 1644000
1 25,000 2093000
2 0 2221000
3 72,000 2046000
4 -58,000 1970000
我需要的最終結果如下(應顯示總計2014和總計2013):
LAESTAB total2016 total2015 etc...\
2013614 1956000 1644000
2022019 1059000 2093000
2022036 2234000 2221000
2022065 2284000 1970000
...
我已經嘗試了'reduce',如下所示,但它返回0行×66列。
dataframe_list = [cuts2013, cuts2014, cuts2015, cuts2016]
df_final = reduce(lambda left,right: pd.merge(left,right,on='LAESTAB'), dataframe_list)
使用LAESTAB列合並數據框SQL樣式,然后根據需要從data_merged
刪除列。
import pandas as pd
data_merged = pd.merge(cuts2016,cuts2015,on = "LAESTAB")
有關合並的更多信息,請查看以下鏈接:
http://chrisalbon.com/python/pandas_join_merge_dataframe.html
http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.merge.html
一種方法是使用Mainul Islam指出的合並。 在這里,您必須執行3次合並操作才能合並4個數據幀。 否則,您可以連接所有4個數據幀並執行groupby操作。
dataframe_list = [cuts2013, cuts2014, cuts2015, cuts2016]
total = pd.concat(dataframe_list)
total = total.groupby('LAESTAB')['total2013', 'total2014', 'total2015','total2016'].sum().reset_index()
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.