[英]Pandas: Accessing multiple columns under different top level column index in Multi-index columns Dataframe
我在找出表格標題的索引時遇到了麻煩,我想將其抓取並輸出到 csv 文件中,所以我需要分類在ResidualMaturity
和Last
下的列,而我只能獲得表格的主標題子的。 我試過使用df[('Yield', 'Last')
但我只能得到那個特定的列,而不是兩者。
import pandas as pd
import requests
url = 'http://www.worldgovernmentbonds.com/country/japan/'
r = requests.get(url)
df_list = pd.read_html(r.text, flavor='html5lib')
df = df_list[4]
yc = df[["ResidualMaturity", "Yield"]]
print(yc)
電流輸出
ResidualMaturity Yield
ResidualMaturity Last Chg 1M Chg 6M
0 1 month -0.114% +9.0 bp +7.4 bp
1 3 months -0.109% 0.0 bp -1.9 bp
2 6 months -0.119% -0.3 bp -1.9 bp
3 9 months -0.119% +10.0 bp +9.9 bp
4 1 year -0.125% -0.7 bp +0.9 bp
5 2 years -0.121% +0.9 bp +1.3 bp
6 3 years -0.113% +2.2 bp +2.7 bp
7 4 years -0.094% +2.6 bp +2.1 bp
8 5 years -0.082% +2.3 bp +1.8 bp
9 6 years -0.056% +3.4 bp +0.4 bp
10 7 years -0.029% +5.1 bp -0.4 bp
11 8 years 0.007% +5.6 bp -0.7 bp
12 9 years 0.052% +5.6 bp -1.3 bp
13 10 years 0.087% +4.7 bp -1.2 bp
14 15 years 0.288% +4.3 bp -2.4 bp
15 20 years 0.460% +3.7 bp -1.5 bp
16 30 years 0.689% +3.5 bp +1.6 bp
17 40 years 0.757% +3.5 bp +7.3 bp
我想要得到的期望輸出
ResidualMaturity Last
0 1 month -0.114%
1 3 months -0.109%
2 6 months -0.119%
3 9 months -0.119%
4 1 year -0.125%
5 2 years -0.121%
6 3 years -0.113%
7 4 years -0.094%
8 5 years -0.082%
9 6 years -0.056%
10 7 years -0.029%
11 8 years 0.007%
12 9 years 0.052%
13 10 years 0.087%
14 15 years 0.288%
15 20 years 0.460%
16 30 years 0.689%
17 40 years 0.757%
我試過使用df[('Yield', 'Last')]
但我只能得到那個特定的列,而不是兩者。
將pd.IndexSlice
與.loc
一起使用
idx = pd.IndexSlice
yc.loc[:, idx[:, ['ResidualMaturity', 'Last']]]
或者,在axis=1
上使用.loc
,如下所示:
idx = pd.IndexSlice
yc.loc(axis=1)[idx[:, ['ResidualMaturity', 'Last']]]
pd.IndexSlice
以這種方式允許我們指定級別 1 列標簽而不指定級別 0 列標簽。
結果:
ResidualMaturity Yield
ResidualMaturity Last
0 1 month -0.110%
1 3 months -0.109%
2 6 months -0.119%
3 9 months -0.115%
4 1 year -0.125%
5 2 years -0.120%
6 3 years -0.113%
7 4 years -0.094%
8 5 years -0.084%
9 6 years -0.057%
10 7 years -0.031%
11 8 years 0.005%
12 9 years 0.050%
13 10 years 0.086%
14 15 years 0.287%
15 20 years 0.461%
16 30 years 0.689%
17 40 years 0.757%
如果不想顯示 0 級列索引:
idx = pd.IndexSlice
yc.loc(axis=1)[idx[:, ['ResidualMaturity', 'Last']]].droplevel(0, axis=1)
結果:
ResidualMaturity Last
0 1 month -0.110%
1 3 months -0.109%
2 6 months -0.119%
3 9 months -0.115%
4 1 year -0.125%
5 2 years -0.120%
6 3 years -0.113%
7 4 years -0.094%
8 5 years -0.084%
9 6 years -0.057%
10 7 years -0.031%
11 8 years 0.005%
12 9 years 0.050%
13 10 years 0.086%
14 15 years 0.287%
15 20 years 0.461%
16 30 years 0.689%
17 40 years 0.757%
這是我得到的輸出:
import pandas as pd
import requests
url = 'http://www.worldgovernmentbonds.com/country/japan/'
r = requests.get(url)
df_list = pd.read_html(r.text, flavor='html5lib')
df = df_list[4]
yc = df[df.columns[1:3]].droplevel(0, axis=1)
print(yc)
輸出:
ResidualMaturity Last
0 1 month -0.110%
1 3 months -0.109%
2 6 months -0.119%
3 9 months -0.115%
4 1 year -0.125%
5 2 years -0.120%
6 3 years -0.113%
7 4 years -0.094%
8 5 years -0.084%
9 6 years -0.057%
10 7 years -0.031%
11 8 years 0.005%
12 9 years 0.050%
13 10 years 0.086%
14 15 years 0.287%
15 20 years 0.461%
16 30 years 0.689%
17 40 years 0.757%
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.