简体   繁体   English

熊猫-从multindex列获取值

[英]pandas - get values from multindex columns

I have the following dataframe df: 我有以下数据框df:

H,Nu,City,Code,Code2
0.965392,15,Madrid,es,es
0.920614,15,Madrid,it,es
0.726219,16,Madrid,tn,es
0.739119,17,Madrid,fr,es
0.789923,55,Dublin,mt,en
0.699239,57,Dublin,en,en
0.890462,68,Dublin,ar,en
0.746863,68,Dublin,pt,en
0.789923,55,Milano,it,it
0.699239,57,Milano,es,it
0.890462,68,Milano,ar,it
0.746863,68,Milano,pt,it

I would like to add a new column HCode , for each City , with the H value corresponding to the Code mapped by the Code2 string, so that the resulting dataframe appears as: 我想为每个City添加一个新列HCode ,其H值对应于Code2字符串映射的Code ,因此结果数据帧显示为:

H,Nu,City,Code,Code2,HCode
0.965392,15,Madrid,es,es,0.965392
0.920614,15,Madrid,it,es,0.965392
0.726219,16,Madrid,tn,es,0.965392
0.739119,17,Madrid,fr,es,0.965392
0.789923,55,Dublin,mt,en,0.699239
0.699239,57,Dublin,en,en,0.699239
0.890462,68,Dublin,ar,en,0.699239
0.746863,68,Dublin,pt,en,0.699239
0.789923,55,Milano,it,it,0.789923
0.699239,57,Milano,es,it,0.789923
0.890462,68,Milano,ar,it,0.789923
0.746863,68,Milano,pt,it,0.789923

So far I tried to groupby by City and Code2, but with no results. 到目前为止,我尝试按City和Code2分组,但没有结果。

You can groupby on 'City' and 'Code2', call first on this and reset the index resulting in the following: 您可以groupby在“城市”和“代码2”,拨打first就这个问题和复位导致以下指标:

In [172]:
gp = df.groupby(['City','Code2'])['H'].first().reset_index()
gp

Out[172]:
     City Code2         H
0  Dublin    en  0.789923
1  Madrid    es  0.965392
2  Milano    it  0.789923

Then perform a left merge on your original df and select the 'H_y' column, the name comes from the fact that the columns clash and ffill this: 然后在原始df上执行左合并,然后选择'H_y'列,该名称来自以下事实:各列发生冲突并ffill以下条件:

In [173]:
df['HCode'] = df.merge(gp, left_on=['City', 'Code'], right_on=['City', 'Code2'], how='left')['H_y'].ffill()
df

Out[173]:
           H  Nu    City Code Code2     HCode
0   0.965392  15  Madrid   es    es  0.965392
1   0.920614  15  Madrid   it    es  0.965392
2   0.726219  16  Madrid   tn    es  0.965392
3   0.739119  17  Madrid   fr    es  0.965392
4   0.789923  55  Dublin   mt    en  0.965392
5   0.699239  57  Dublin   en    en  0.789923
6   0.890462  68  Dublin   ar    en  0.789923
7   0.746863  68  Dublin   pt    en  0.789923
8   0.789923  55  Milano   it    it  0.789923
9   0.699239  57  Milano   es    it  0.789923
10  0.890462  68  Milano   ar    it  0.789923
11  0.746863  68  Milano   pt    it  0.789923

Result of merge to show what it produces: merge结果以显示产生的结果:

In [165]:
df.merge(gp, left_on=['City', 'Code'], right_on=['City', 'Code2'])['H_y']

Out[165]:
0    0.965392
1    0.789923
2    0.789923
Name: H_y, dtype: float64

EDIT 编辑

OK, IIUC you can group as before but then filter the group where 'Code2' equals 'Code' and then use this to merge against: 好的,IIUC可以像以前一样进行分组,但是可以过滤“ Code2”等于“ Code”的组,然后将其合并为:

In [200]:
gp = df.groupby('City')
mask = gp.apply(lambda x: x['Code2'] == x['Code'])
lookup = df.loc[mask[mask].reset_index(level=0).index]
lookup

Out[200]:
          H  Nu    City Code Code2
5  0.699239  57  Dublin   en    en
0  0.965392  15  Madrid   es    es
8  0.789923  55  Milano   it    it

In [202]:
df['HCode'] = df.merge(lookup, left_on=['City', 'Code'], right_on=['City', 'Code2'], how='left')['H_y'].ffill()
df

Out[202]:
           H  Nu    City Code Code2     HCode
0   0.965392  15  Madrid   es    es  0.965392
1   0.920614  15  Madrid   it    es  0.965392
2   0.726219  16  Madrid   tn    es  0.965392
3   0.739119  17  Madrid   fr    es  0.965392
4   0.789923  55  Dublin   mt    en  0.965392
5   0.699239  57  Dublin   en    en  0.699239
6   0.890462  68  Dublin   ar    en  0.699239
7   0.746863  68  Dublin   pt    en  0.699239
8   0.789923  55  Milano   it    it  0.789923
9   0.699239  57  Milano   es    it  0.789923
10  0.890462  68  Milano   ar    it  0.789923
11  0.746863  68  Milano   pt    it  0.789923

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM