简体   繁体   中英

Pandas: More efficient way to update a column in pandas dataframe without a for loop

I have a pandas dataframe in which I want to update the value of a column based on another column in the dataframe. I was using the following code to update it before:

for i1, col1 in dfMod.iterrows():
if col1['day'] == "MONDAY":
    dfMod.ix[i1,'weekIndex'] = 1
elif col1['day'] == "TUESDAY":
    dfMod.ix[i1,'weekIndex'] = 2
elif col1['day'] == "WEDNESDAY":
    dfMod.ix[i1,'weekIndex'] = 3
elif col1['day'] == "THURSDAY":
    dfMod.ix[i1,'weekIndex'] = 4
elif col1['day'] == "FRIDAY":
    dfMod.ix[i1,'weekIndex'] = 5
elif col1['day'] == "SATURDAY":
    dfMod.ix[i1,'weekIndex'] = 6
else:
    dfMod.ix[i1,'weekIndex'] = 7

However, the dataframe has 300,000 rows and takes forever to compile. Is there a better way to update the column?

You need map by dict :

d = {"MONDAY": 1, "TUESDAY":2, "WEDNESDAY":3, "THURSDAY":4, 
     "FRIDAY":5, "SATURDAY":6, "SUNDAY":7}

dfMod["weekIndex"] = dfMod["day"].map(d)

Sample:

dfMod = pd.DataFrame({'day':['TUESDAY','THURSDAY','FRIDAY','SATURDAY','MONDAY','SUNDAY']})

d = {"MONDAY": 1, "TUESDAY":2, "WEDNESDAY":3, 
     "THURSDAY":4, "FRIDAY":5, "SATURDAY":6, "SUNDAY":7}

dfMod["weekIndex"] = dfMod["day"].map(d)
print (dfMod)
        day  weekIndex
0   TUESDAY          2
1  THURSDAY          4
2    FRIDAY          5
3  SATURDAY          6
4    MONDAY          1
5    SUNDAY          7

Timings in 300k - map is 6 times faster as apply solution:

dfMod = pd.DataFrame({'day':['TUESDAY','THURSDAY','FRIDAY','SATURDAY','MONDAY','SUNDAY']})
#300k rows
dfMod = pd.concat([dfMod]*50000).reset_index(drop=True)

d = {"MONDAY": 1, "TUESDAY":2, "WEDNESDAY":3, "THURSDAY":4, 
     "FRIDAY":5, "SATURDAY":6, "SUNDAY":7}

In [92]: %timeit dfMod["weekIndex"] = dfMod["day"].map(d)
10 loops, best of 3: 22.7 ms per loop

In [93]: %timeit dfMod["weekIndex1"] = dfMod["day"].apply(lambda x: d[x])
10 loops, best of 3: 141 ms per loop

Try the apply method:

daysOfWeek = {"MONDAY": 1, "TUESDAY":2, "WEDNESDAY":3, "THURSDAY":4, "FRIDAY":5, "SATURDAY":6, "SUNDAY":7}

dfMod["weekIndex"] = dfMod["day"].apply(lambda x: daysOfWeek[x])

Please use @jezrael's answer as it is idiomatic.
This is purely for demonstration and an attempt to provide useful information about other pandas tools that could be used.

setup
using @jezrael's given example

dfMod = pd.DataFrame({'day':['TUESDAY','THURSDAY','FRIDAY','SATURDAY','MONDAY','SUNDAY']})

d = {"MONDAY": 1, "TUESDAY":2, "WEDNESDAY":3, 
     "THURSDAY":4, "FRIDAY":5, "SATURDAY":6, "SUNDAY":7}

alternate solution

dfMod.join(pd.Series(d, name='weekIndex'), on='day')

        day  weekIndex
0   TUESDAY          2
1  THURSDAY          4
2    FRIDAY          5
3  SATURDAY          6
4    MONDAY          1
5    SUNDAY          7

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM