根据另一个 dataframe 的值创建新列 dataframe 运行速度快吗？

Question

i want to create a new columns for my df_cau2['continent'] .我想为我的df_cau2['continent']创建一个新列。 first there r 2 df of mine:首先是我的 r 2 df：

country_continent
    Continent
Country 
Afghanistan Asia
Albania Europe
Algeria Africa
American Samoa  Oceania

and
df_cau2 
    date    home_team   away_team   home_score  away_score  tournament  city    country neutral
0   1872-11-30  Scotland    England 0   0   Friendly    Glasgow Scotland    False
1   1873-03-08  England Scotland    4   2   Friendly    London  England False
2   1874-03-07  Scotland    England 2   1   Friendly    Glasgow Scotland    False

to create new column continent i use apply for df_cau2 like this:创建新列continent我使用申请 df_cau2 像这样：


def same_continent(home,away):
    if country_continent.loc[home].Continent == country_continent.loc[away].Continent:
        return country_continent.loc[home].Continent
    return 'None'

df_cau2['continent']=df_cau2.apply(lambda x: same_continent(x['home_team'],x['away_team']),axis=1)
df_cau2.head()

with 39480 rows of df_cau2, this code run too slow, how can i change my code to run it's faster?使用 39480 行 df_cau2，此代码运行速度太慢，如何更改我的代码以使其运行得更快？ i am thinking about using np.select but i don't know how to use it's in this case.我正在考虑使用np.select但在这种情况下我不知道如何使用它。

This is result that i want:这是我想要的结果：

date    home_team   away_team   home_score  away_score  tournament  city    country neutral continent
7611    1970-09-11  Iran    Turkey  1   1   Friendly    Teheran Iran    False   None
31221   2009-03-11  Nepal   Pakistan    1   0   Friendly    Kathmandu   Nepal   False   Asia
32716   2010-11-17  Colombia    Peru    1   1   Friendly    Bogotá  Colombia    False   South America

Thanks谢谢

Answer 1

IIUC, you want to set continent column only if home_team and away_team columns are in the same continent: IIUC，仅当home_team和away_team列位于同一大洲时，您才想设置continent列：

home_continent = df1['home_team'].map(df2.squeeze())
away_continent = df1['away_team'].map(df2.squeeze())
m = home_continent == away_continent
df1.loc[m, 'continent'] = home_continent.loc[m]
print(df1)

# Output
  home_team away_team continent
0    Canada   England       NaN
1    France     Spain    Europe
2     China     Japan      Asia

Setup a MRE设置 MRE

df1 = pd.DataFrame({'home_team': ['Canada', 'France', 'China'],
                    'away_team': ['England', 'Spain', 'Japan']})
print(df1)

df2 = pd.DataFrame({'Country': ['Canada', 'China', 'England',
                                'France', 'Japan', 'Spain'],
                    'Continent': ['North America', 'Asia', 'Europe',
                                  'Europe', 'Asia', 'Europe']}).set_index('Country')
print(df2)

# Output df1
  home_team away_team
0    Canada   England
1    France     Spain
2     China     Japan

# Output df2
             Continent
Country               
Canada   North America
China             Asia
England         Europe
France          Europe
Japan             Asia
Spain           Europe

Answer 2

Consider merge of the continent lookup data frame to create home and away continent columns.考虑merge大陆查找数据框以创建主大陆和客大陆列。 And since you will have both continents assign new shared continent column conditionally with numpy.where :并且由于您将让两个大陆有条件地使用numpy.where分配新的共享大陆列：

df_cau2 = (
    df.cau2.merge(
        country_continent.reset_index(),
        left_on = "home_team",
        right_on = "Country",
        how = "left"
    ).merge(
        country_continent.reset_index(),
        left_on = "away_team",
        right_on = "Country",
        how = "left",
        suffixes = ["_home", "_away"]
    )
)

df_cau2["shared_continent"] = np.where(
    df_cau2["Continent_home"].eq(df_cau2["Continent_away"],
    df_cau2["Continent_home"],
    np.nan
)

根据另一个 dataframe 的值创建新列 dataframe 运行速度快吗？

问题描述

2 个解决方案

解决方案1
1 已采纳 2022-01-15 16:21:51

解决方案2
0 2022-01-15 18:29:27

根据另一个 dataframe 的值创建新列 dataframe 运行速度快吗？

问题描述

2 个解决方案

解决方案1 1 已采纳 2022-01-15 16:21:51

解决方案2 0 2022-01-15 18:29:27

解决方案1
1 已采纳 2022-01-15 16:21:51

解决方案2
0 2022-01-15 18:29:27