[英]pandas - get values from multindex columns
I have the following dataframe df: 我有以下数据框df:
H,Nu,City,Code,Code2
0.965392,15,Madrid,es,es
0.920614,15,Madrid,it,es
0.726219,16,Madrid,tn,es
0.739119,17,Madrid,fr,es
0.789923,55,Dublin,mt,en
0.699239,57,Dublin,en,en
0.890462,68,Dublin,ar,en
0.746863,68,Dublin,pt,en
0.789923,55,Milano,it,it
0.699239,57,Milano,es,it
0.890462,68,Milano,ar,it
0.746863,68,Milano,pt,it
I would like to add a new column HCode
, for each City
, with the H
value corresponding to the Code
mapped by the Code2
string, so that the resulting dataframe appears as: 我想为每个
City
添加一个新列HCode
,其H
值对应于Code2
字符串映射的Code
,因此结果数据帧显示为:
H,Nu,City,Code,Code2,HCode
0.965392,15,Madrid,es,es,0.965392
0.920614,15,Madrid,it,es,0.965392
0.726219,16,Madrid,tn,es,0.965392
0.739119,17,Madrid,fr,es,0.965392
0.789923,55,Dublin,mt,en,0.699239
0.699239,57,Dublin,en,en,0.699239
0.890462,68,Dublin,ar,en,0.699239
0.746863,68,Dublin,pt,en,0.699239
0.789923,55,Milano,it,it,0.789923
0.699239,57,Milano,es,it,0.789923
0.890462,68,Milano,ar,it,0.789923
0.746863,68,Milano,pt,it,0.789923
So far I tried to groupby by City and Code2, but with no results. 到目前为止,我尝试按City和Code2分组,但没有结果。
You can groupby
on 'City' and 'Code2', call first
on this and reset the index resulting in the following: 您可以
groupby
在“城市”和“代码2”,拨打first
就这个问题和复位导致以下指标:
In [172]:
gp = df.groupby(['City','Code2'])['H'].first().reset_index()
gp
Out[172]:
City Code2 H
0 Dublin en 0.789923
1 Madrid es 0.965392
2 Milano it 0.789923
Then perform a left merge on your original df and select the 'H_y' column, the name comes from the fact that the columns clash and ffill
this: 然后在原始df上执行左合并,然后选择'H_y'列,该名称来自以下事实:各列发生冲突并
ffill
以下条件:
In [173]:
df['HCode'] = df.merge(gp, left_on=['City', 'Code'], right_on=['City', 'Code2'], how='left')['H_y'].ffill()
df
Out[173]:
H Nu City Code Code2 HCode
0 0.965392 15 Madrid es es 0.965392
1 0.920614 15 Madrid it es 0.965392
2 0.726219 16 Madrid tn es 0.965392
3 0.739119 17 Madrid fr es 0.965392
4 0.789923 55 Dublin mt en 0.965392
5 0.699239 57 Dublin en en 0.789923
6 0.890462 68 Dublin ar en 0.789923
7 0.746863 68 Dublin pt en 0.789923
8 0.789923 55 Milano it it 0.789923
9 0.699239 57 Milano es it 0.789923
10 0.890462 68 Milano ar it 0.789923
11 0.746863 68 Milano pt it 0.789923
Result of merge
to show what it produces: merge
结果以显示产生的结果:
In [165]:
df.merge(gp, left_on=['City', 'Code'], right_on=['City', 'Code2'])['H_y']
Out[165]:
0 0.965392
1 0.789923
2 0.789923
Name: H_y, dtype: float64
EDIT 编辑
OK, IIUC you can group as before but then filter the group where 'Code2' equals 'Code' and then use this to merge against: 好的,IIUC可以像以前一样进行分组,但是可以过滤“ Code2”等于“ Code”的组,然后将其合并为:
In [200]:
gp = df.groupby('City')
mask = gp.apply(lambda x: x['Code2'] == x['Code'])
lookup = df.loc[mask[mask].reset_index(level=0).index]
lookup
Out[200]:
H Nu City Code Code2
5 0.699239 57 Dublin en en
0 0.965392 15 Madrid es es
8 0.789923 55 Milano it it
In [202]:
df['HCode'] = df.merge(lookup, left_on=['City', 'Code'], right_on=['City', 'Code2'], how='left')['H_y'].ffill()
df
Out[202]:
H Nu City Code Code2 HCode
0 0.965392 15 Madrid es es 0.965392
1 0.920614 15 Madrid it es 0.965392
2 0.726219 16 Madrid tn es 0.965392
3 0.739119 17 Madrid fr es 0.965392
4 0.789923 55 Dublin mt en 0.965392
5 0.699239 57 Dublin en en 0.699239
6 0.890462 68 Dublin ar en 0.699239
7 0.746863 68 Dublin pt en 0.699239
8 0.789923 55 Milano it it 0.789923
9 0.699239 57 Milano es it 0.789923
10 0.890462 68 Milano ar it 0.789923
11 0.746863 68 Milano pt it 0.789923
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.