I have a pandas dataframe in which I want to update the value of a column based on another column in the dataframe. I was using the following code to update it before:
for i1, col1 in dfMod.iterrows():
if col1['day'] == "MONDAY":
dfMod.ix[i1,'weekIndex'] = 1
elif col1['day'] == "TUESDAY":
dfMod.ix[i1,'weekIndex'] = 2
elif col1['day'] == "WEDNESDAY":
dfMod.ix[i1,'weekIndex'] = 3
elif col1['day'] == "THURSDAY":
dfMod.ix[i1,'weekIndex'] = 4
elif col1['day'] == "FRIDAY":
dfMod.ix[i1,'weekIndex'] = 5
elif col1['day'] == "SATURDAY":
dfMod.ix[i1,'weekIndex'] = 6
else:
dfMod.ix[i1,'weekIndex'] = 7
However, the dataframe has 300,000 rows and takes forever to compile. Is there a better way to update the column?
You need map
by dict
:
d = {"MONDAY": 1, "TUESDAY":2, "WEDNESDAY":3, "THURSDAY":4,
"FRIDAY":5, "SATURDAY":6, "SUNDAY":7}
dfMod["weekIndex"] = dfMod["day"].map(d)
Sample:
dfMod = pd.DataFrame({'day':['TUESDAY','THURSDAY','FRIDAY','SATURDAY','MONDAY','SUNDAY']})
d = {"MONDAY": 1, "TUESDAY":2, "WEDNESDAY":3,
"THURSDAY":4, "FRIDAY":5, "SATURDAY":6, "SUNDAY":7}
dfMod["weekIndex"] = dfMod["day"].map(d)
print (dfMod)
day weekIndex
0 TUESDAY 2
1 THURSDAY 4
2 FRIDAY 5
3 SATURDAY 6
4 MONDAY 1
5 SUNDAY 7
Timings in 300k
- map
is 6 times
faster as apply
solution:
dfMod = pd.DataFrame({'day':['TUESDAY','THURSDAY','FRIDAY','SATURDAY','MONDAY','SUNDAY']})
#300k rows
dfMod = pd.concat([dfMod]*50000).reset_index(drop=True)
d = {"MONDAY": 1, "TUESDAY":2, "WEDNESDAY":3, "THURSDAY":4,
"FRIDAY":5, "SATURDAY":6, "SUNDAY":7}
In [92]: %timeit dfMod["weekIndex"] = dfMod["day"].map(d)
10 loops, best of 3: 22.7 ms per loop
In [93]: %timeit dfMod["weekIndex1"] = dfMod["day"].apply(lambda x: d[x])
10 loops, best of 3: 141 ms per loop
Try the apply
method:
daysOfWeek = {"MONDAY": 1, "TUESDAY":2, "WEDNESDAY":3, "THURSDAY":4, "FRIDAY":5, "SATURDAY":6, "SUNDAY":7}
dfMod["weekIndex"] = dfMod["day"].apply(lambda x: daysOfWeek[x])
Please use @jezrael's answer as it is idiomatic.
This is purely for demonstration and an attempt to provide useful information about other pandas tools that could be used.
setup
using @jezrael's given example
dfMod = pd.DataFrame({'day':['TUESDAY','THURSDAY','FRIDAY','SATURDAY','MONDAY','SUNDAY']})
d = {"MONDAY": 1, "TUESDAY":2, "WEDNESDAY":3,
"THURSDAY":4, "FRIDAY":5, "SATURDAY":6, "SUNDAY":7}
alternate solution
dfMod.join(pd.Series(d, name='weekIndex'), on='day')
day weekIndex
0 TUESDAY 2
1 THURSDAY 4
2 FRIDAY 5
3 SATURDAY 6
4 MONDAY 1
5 SUNDAY 7
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.