[英]Using one pandas dataframe to populate new column in another pandas dataframe
我有两个数据框。 第一个数据帧是df_states
,第二个数据帧是state_lookup
。
df_states
state code score
0 Texas 0 0.753549
1 Pennsylvania 0 0.998119
2 California 1 0.125751
3 Texas 2 0.125751
state_lookup
state code_0 code_1 code_2
0 Texas 2014 2015 2019
1 Pennsylvania 2015 2016 207
2 California 2014 2015 2019
我想在df_states
创建一个名为“year”的新列,它基于基于state_lookup
表的“code”列。 例如,如果德克萨斯州的代码 = 0,那么根据state_lookup
df,年份应该是 2014。如果德克萨斯州的代码 = 2,那么年份应该是 2019。
最终结果应该是这样的:
df_states
state code score year
0 Texas 0 0.753 2014
1 Pennsylvania 0 0.998 2015
2 California 1 0.125 2015
3 Texas 2 0.124 2019
我尝试使用for
循环遍历每一行,但无法使其工作。 你将如何实现这一目标?
您可以先在state_lookup
df 上使用wide_to_long
以便执行merge
:
s = pd.wide_to_long(state_lookup,stubnames="code",sep="_",i="state",j="year",suffix="\d").reset_index()
s.columns = ["state","code","year"] #rename the columns properly
print (df_states.merge(s, on=["state","code"],how="left"))
state code score year
0 Texas 0 0.753549 2014
1 Pennsylvania 0 0.998119 2015
2 California 1 0.125751 2015
3 Texas 2 0.125751 2019
加载数据帧
df_states = pd.DataFrame({'state':['Texas','Pennsylvania','California','Texas'],'code':[0,0,1,2], 'score':[0.753549,0.998119,0.125751,0.12575]})
state_lookup = pd.DataFrame({'state':['Texas','Pennsylvania','California'],'code_0': [2014,2015,2014],'code_1': [2015,2016,2017] , 'code_2': [2019,2017,2019]})
首先使用melt
您转换code_
列成行
melted_lookup = pd.melt(state_lookup,
id_vars=['state'],
value_vars=[col for col in state_lookup.columns if col.startswith('code_')],
var_name='new_code',
value_name='year')
然后合并两个数据帧:
df_states['new_code'] = "code_"+ df_states.code.astype('str')
df_states = pd.merge(df_states, melted_lookup, how = 'left', on =['new_code','state'])
# state code score new_code year
#0 Texas 0 0.753549 code_0 2014
#1 Pennsylvania 0 0.998119 code_0 2015
#2 California 1 0.125751 code_1 2017
#3 Texas 2 0.125750 code_2 2019
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.