简体   繁体   English

Pandas:在没有for循环的情况下更新pandas数据框中列的更有效方法

[英]Pandas: More efficient way to update a column in pandas dataframe without a for loop

I have a pandas dataframe in which I want to update the value of a column based on another column in the dataframe. 我有一个pandas数据框,我想在其中基于数据框中的另一列更新一列的值。 I was using the following code to update it before: 我之前使用以下代码对其进行了更新:

for i1, col1 in dfMod.iterrows():
if col1['day'] == "MONDAY":
    dfMod.ix[i1,'weekIndex'] = 1
elif col1['day'] == "TUESDAY":
    dfMod.ix[i1,'weekIndex'] = 2
elif col1['day'] == "WEDNESDAY":
    dfMod.ix[i1,'weekIndex'] = 3
elif col1['day'] == "THURSDAY":
    dfMod.ix[i1,'weekIndex'] = 4
elif col1['day'] == "FRIDAY":
    dfMod.ix[i1,'weekIndex'] = 5
elif col1['day'] == "SATURDAY":
    dfMod.ix[i1,'weekIndex'] = 6
else:
    dfMod.ix[i1,'weekIndex'] = 7

However, the dataframe has 300,000 rows and takes forever to compile. 但是,数据帧具有300,000行,并且要花很长时间才能编译。 Is there a better way to update the column? 有没有更好的方法来更新列?

You need map by dict : 您需要按dict map

d = {"MONDAY": 1, "TUESDAY":2, "WEDNESDAY":3, "THURSDAY":4, 
     "FRIDAY":5, "SATURDAY":6, "SUNDAY":7}

dfMod["weekIndex"] = dfMod["day"].map(d)

Sample: 样品:

dfMod = pd.DataFrame({'day':['TUESDAY','THURSDAY','FRIDAY','SATURDAY','MONDAY','SUNDAY']})

d = {"MONDAY": 1, "TUESDAY":2, "WEDNESDAY":3, 
     "THURSDAY":4, "FRIDAY":5, "SATURDAY":6, "SUNDAY":7}

dfMod["weekIndex"] = dfMod["day"].map(d)
print (dfMod)
        day  weekIndex
0   TUESDAY          2
1  THURSDAY          4
2    FRIDAY          5
3  SATURDAY          6
4    MONDAY          1
5    SUNDAY          7

Timings in 300k - map is 6 times faster as apply solution: 时间300k apply解决方案的map速度提高了6 times

dfMod = pd.DataFrame({'day':['TUESDAY','THURSDAY','FRIDAY','SATURDAY','MONDAY','SUNDAY']})
#300k rows
dfMod = pd.concat([dfMod]*50000).reset_index(drop=True)

d = {"MONDAY": 1, "TUESDAY":2, "WEDNESDAY":3, "THURSDAY":4, 
     "FRIDAY":5, "SATURDAY":6, "SUNDAY":7}

In [92]: %timeit dfMod["weekIndex"] = dfMod["day"].map(d)
10 loops, best of 3: 22.7 ms per loop

In [93]: %timeit dfMod["weekIndex1"] = dfMod["day"].apply(lambda x: d[x])
10 loops, best of 3: 141 ms per loop

Try the apply method: 尝试apply方法:

daysOfWeek = {"MONDAY": 1, "TUESDAY":2, "WEDNESDAY":3, "THURSDAY":4, "FRIDAY":5, "SATURDAY":6, "SUNDAY":7}

dfMod["weekIndex"] = dfMod["day"].apply(lambda x: daysOfWeek[x])

Please use @jezrael's answer as it is idiomatic. 请使用@jezrael的答案,因为这是惯用的。
This is purely for demonstration and an attempt to provide useful information about other pandas tools that could be used. 这纯粹是为了演示,并试图提供有关可以使用的其他熊猫工具的有用信息。

setup 设定
using @jezrael's given example 使用@jezrael的示例

dfMod = pd.DataFrame({'day':['TUESDAY','THURSDAY','FRIDAY','SATURDAY','MONDAY','SUNDAY']})

d = {"MONDAY": 1, "TUESDAY":2, "WEDNESDAY":3, 
     "THURSDAY":4, "FRIDAY":5, "SATURDAY":6, "SUNDAY":7}

alternate solution 替代解决方案

dfMod.join(pd.Series(d, name='weekIndex'), on='day')

        day  weekIndex
0   TUESDAY          2
1  THURSDAY          4
2    FRIDAY          5
3  SATURDAY          6
4    MONDAY          1
5    SUNDAY          7

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM