[英]create new column of dataframe base on value of another dataframe run fast?
i want to create a new columns for my df_cau2['continent']
.我想为我的df_cau2['continent']
创建一个新列。 first there r 2 df of mine:首先是我的 r 2 df:
country_continent
Continent
Country
Afghanistan Asia
Albania Europe
Algeria Africa
American Samoa Oceania
and
df_cau2
date home_team away_team home_score away_score tournament city country neutral
0 1872-11-30 Scotland England 0 0 Friendly Glasgow Scotland False
1 1873-03-08 England Scotland 4 2 Friendly London England False
2 1874-03-07 Scotland England 2 1 Friendly Glasgow Scotland False
to create new column continent
i use apply for df_cau2 like this:创建新列continent
我使用申请 df_cau2 像这样:
def same_continent(home,away):
if country_continent.loc[home].Continent == country_continent.loc[away].Continent:
return country_continent.loc[home].Continent
return 'None'
df_cau2['continent']=df_cau2.apply(lambda x: same_continent(x['home_team'],x['away_team']),axis=1)
df_cau2.head()
with 39480 rows of df_cau2, this code run too slow, how can i change my code to run it's faster?使用 39480 行 df_cau2,此代码运行速度太慢,如何更改我的代码以使其运行得更快? i am thinking about using np.select
but i don't know how to use it's in this case.我正在考虑使用np.select
但在这种情况下我不知道如何使用它。
This is result that i want:这是我想要的结果:
date home_team away_team home_score away_score tournament city country neutral continent
7611 1970-09-11 Iran Turkey 1 1 Friendly Teheran Iran False None
31221 2009-03-11 Nepal Pakistan 1 0 Friendly Kathmandu Nepal False Asia
32716 2010-11-17 Colombia Peru 1 1 Friendly Bogotá Colombia False South America
Thanks谢谢
IIUC, you want to set continent
column only if home_team
and away_team
columns are in the same continent: IIUC,仅当home_team
和away_team
列位于同一大洲时,您才想设置continent
列:
home_continent = df1['home_team'].map(df2.squeeze())
away_continent = df1['away_team'].map(df2.squeeze())
m = home_continent == away_continent
df1.loc[m, 'continent'] = home_continent.loc[m]
print(df1)
# Output
home_team away_team continent
0 Canada England NaN
1 France Spain Europe
2 China Japan Asia
Setup a MRE设置 MRE
df1 = pd.DataFrame({'home_team': ['Canada', 'France', 'China'],
'away_team': ['England', 'Spain', 'Japan']})
print(df1)
df2 = pd.DataFrame({'Country': ['Canada', 'China', 'England',
'France', 'Japan', 'Spain'],
'Continent': ['North America', 'Asia', 'Europe',
'Europe', 'Asia', 'Europe']}).set_index('Country')
print(df2)
# Output df1
home_team away_team
0 Canada England
1 France Spain
2 China Japan
# Output df2
Continent
Country
Canada North America
China Asia
England Europe
France Europe
Japan Asia
Spain Europe
Consider merge
of the continent lookup data frame to create home and away continent columns.考虑merge
大陆查找数据框以创建主大陆和客大陆列。 And since you will have both continents assign new shared continent column conditionally with numpy.where
:并且由于您将让两个大陆有条件地使用numpy.where
分配新的共享大陆列:
df_cau2 = (
df.cau2.merge(
country_continent.reset_index(),
left_on = "home_team",
right_on = "Country",
how = "left"
).merge(
country_continent.reset_index(),
left_on = "away_team",
right_on = "Country",
how = "left",
suffixes = ["_home", "_away"]
)
)
df_cau2["shared_continent"] = np.where(
df_cau2["Continent_home"].eq(df_cau2["Continent_away"],
df_cau2["Continent_home"],
np.nan
)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.