簡體   English   中英

根據另一個 dataframe 的值創建新列 dataframe 運行速度快嗎?

[英]create new column of dataframe base on value of another dataframe run fast?

我想為我的df_cau2['continent']創建一個新列。 首先是我的 r 2 df:

country_continent
    Continent
Country 
Afghanistan Asia
Albania Europe
Algeria Africa
American Samoa  Oceania

and
df_cau2 
    date    home_team   away_team   home_score  away_score  tournament  city    country neutral
0   1872-11-30  Scotland    England 0   0   Friendly    Glasgow Scotland    False
1   1873-03-08  England Scotland    4   2   Friendly    London  England False
2   1874-03-07  Scotland    England 2   1   Friendly    Glasgow Scotland    False

創建新列continent我使用申請 df_cau2 像這樣:


def same_continent(home,away):
    if country_continent.loc[home].Continent == country_continent.loc[away].Continent:
        return country_continent.loc[home].Continent
    return 'None'

df_cau2['continent']=df_cau2.apply(lambda x: same_continent(x['home_team'],x['away_team']),axis=1)
df_cau2.head()

使用 39480 行 df_cau2,此代碼運行速度太慢,如何更改我的代碼以使其運行得更快? 我正在考慮使用np.select但在這種情況下我不知道如何使用它。

這是我想要的結果:

date    home_team   away_team   home_score  away_score  tournament  city    country neutral continent
7611    1970-09-11  Iran    Turkey  1   1   Friendly    Teheran Iran    False   None
31221   2009-03-11  Nepal   Pakistan    1   0   Friendly    Kathmandu   Nepal   False   Asia
32716   2010-11-17  Colombia    Peru    1   1   Friendly    Bogotá  Colombia    False   South America

謝謝

IIUC,僅當home_teamaway_team列位於同一大洲時,您才想設置continent列:

home_continent = df1['home_team'].map(df2.squeeze())
away_continent = df1['away_team'].map(df2.squeeze())
m = home_continent == away_continent
df1.loc[m, 'continent'] = home_continent.loc[m]
print(df1)

# Output
  home_team away_team continent
0    Canada   England       NaN
1    France     Spain    Europe
2     China     Japan      Asia

設置 MRE

df1 = pd.DataFrame({'home_team': ['Canada', 'France', 'China'],
                    'away_team': ['England', 'Spain', 'Japan']})
print(df1)

df2 = pd.DataFrame({'Country': ['Canada', 'China', 'England',
                                'France', 'Japan', 'Spain'],
                    'Continent': ['North America', 'Asia', 'Europe',
                                  'Europe', 'Asia', 'Europe']}).set_index('Country')
print(df2)

# Output df1
  home_team away_team
0    Canada   England
1    France     Spain
2     China     Japan

# Output df2
             Continent
Country               
Canada   North America
China             Asia
England         Europe
France          Europe
Japan             Asia
Spain           Europe

考慮merge大陸查找數據框以創建主大陸和客大陸列。 並且由於您將讓兩個大陸有條件地使用numpy.where分配新的共享大陸列:

df_cau2 = (
    df.cau2.merge(
        country_continent.reset_index(),
        left_on = "home_team",
        right_on = "Country",
        how = "left"
    ).merge(
        country_continent.reset_index(),
        left_on = "away_team",
        right_on = "Country",
        how = "left",
        suffixes = ["_home", "_away"]
    )
)

df_cau2["shared_continent"] = np.where(
    df_cau2["Continent_home"].eq(df_cau2["Continent_away"],
    df_cau2["Continent_home"],
    np.nan
)

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM