简体   繁体   English

根据另一个 dataframe 的值创建新列 dataframe 运行速度快吗?

[英]create new column of dataframe base on value of another dataframe run fast?

i want to create a new columns for my df_cau2['continent'] .我想为我的df_cau2['continent']创建一个新列。 first there r 2 df of mine:首先是我的 r 2 df:

country_continent
    Continent
Country 
Afghanistan Asia
Albania Europe
Algeria Africa
American Samoa  Oceania

and
df_cau2 
    date    home_team   away_team   home_score  away_score  tournament  city    country neutral
0   1872-11-30  Scotland    England 0   0   Friendly    Glasgow Scotland    False
1   1873-03-08  England Scotland    4   2   Friendly    London  England False
2   1874-03-07  Scotland    England 2   1   Friendly    Glasgow Scotland    False

to create new column continent i use apply for df_cau2 like this:创建新列continent我使用申请 df_cau2 像这样:


def same_continent(home,away):
    if country_continent.loc[home].Continent == country_continent.loc[away].Continent:
        return country_continent.loc[home].Continent
    return 'None'

df_cau2['continent']=df_cau2.apply(lambda x: same_continent(x['home_team'],x['away_team']),axis=1)
df_cau2.head()

with 39480 rows of df_cau2, this code run too slow, how can i change my code to run it's faster?使用 39480 行 df_cau2,此代码运行速度太慢,如何更改我的代码以使其运行得更快? i am thinking about using np.select but i don't know how to use it's in this case.我正在考虑使用np.select但在这种情况下我不知道如何使用它。

This is result that i want:这是我想要的结果:

date    home_team   away_team   home_score  away_score  tournament  city    country neutral continent
7611    1970-09-11  Iran    Turkey  1   1   Friendly    Teheran Iran    False   None
31221   2009-03-11  Nepal   Pakistan    1   0   Friendly    Kathmandu   Nepal   False   Asia
32716   2010-11-17  Colombia    Peru    1   1   Friendly    Bogotá  Colombia    False   South America

Thanks谢谢

IIUC, you want to set continent column only if home_team and away_team columns are in the same continent: IIUC,仅当home_teamaway_team列位于同一大洲时,您才想设置continent列:

home_continent = df1['home_team'].map(df2.squeeze())
away_continent = df1['away_team'].map(df2.squeeze())
m = home_continent == away_continent
df1.loc[m, 'continent'] = home_continent.loc[m]
print(df1)

# Output
  home_team away_team continent
0    Canada   England       NaN
1    France     Spain    Europe
2     China     Japan      Asia

Setup a MRE设置 MRE

df1 = pd.DataFrame({'home_team': ['Canada', 'France', 'China'],
                    'away_team': ['England', 'Spain', 'Japan']})
print(df1)

df2 = pd.DataFrame({'Country': ['Canada', 'China', 'England',
                                'France', 'Japan', 'Spain'],
                    'Continent': ['North America', 'Asia', 'Europe',
                                  'Europe', 'Asia', 'Europe']}).set_index('Country')
print(df2)

# Output df1
  home_team away_team
0    Canada   England
1    France     Spain
2     China     Japan

# Output df2
             Continent
Country               
Canada   North America
China             Asia
England         Europe
France          Europe
Japan             Asia
Spain           Europe

Consider merge of the continent lookup data frame to create home and away continent columns.考虑merge大陆查找数据框以创建主大陆和客大陆列。 And since you will have both continents assign new shared continent column conditionally with numpy.where :并且由于您将让两个大陆有条件地使用numpy.where分配新的共享大陆列:

df_cau2 = (
    df.cau2.merge(
        country_continent.reset_index(),
        left_on = "home_team",
        right_on = "Country",
        how = "left"
    ).merge(
        country_continent.reset_index(),
        left_on = "away_team",
        right_on = "Country",
        how = "left",
        suffixes = ["_home", "_away"]
    )
)

df_cau2["shared_continent"] = np.where(
    df_cau2["Continent_home"].eq(df_cau2["Continent_away"],
    df_cau2["Continent_home"],
    np.nan
)

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 创建新的数据框列,保留另一列的第一个值 - Create new dataframe column keeping the first value from another column 根据另一个 dataframe 中匹配值的行数创建新列 - Create new column based on number of rows matching value in another dataframe 如何在数据框中使用该数据框中另一列的值创建一个新列,但在 1 小时内 - How to create a new column in a dataframe with the value of another column in that dataframe, but within 1 hour 如何基于另一个DataFrame中的列在Pandas DataFrame中创建新列? - How to create a new column in a Pandas DataFrame based on a column in another DataFrame? 快速从另一个数据框创建熊猫数据框 - Create pandas dataframe out of another dataframe fast Pandas:在 dataframe 中创建列,并通过查看另一个 dataframe 为该列分配值 - Pandas: Create column in dataframe and assign value to the column by looking into another dataframe pandas:通过将 DataFrame 行与另一个 DataFrame 的列进行比较来创建新列 - pandas: Create new column by comparing DataFrame rows with columns of another DataFrame 如何通过过滤另一个 dataframe 的列来创建新的 dataframe - How to create new dataframe by filtering a column of another dataframe 如何根据另一个 dataframe 中的条件在 dataframe 中创建新列? - how to create a new column in a dataframe based on conditions in another dataframe? 使用来自另一个数据帧的 if 条件在 Pandas 数据帧中创建一个新列 - create a new column in pandas dataframe using if condition from another dataframe
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM