[英]Pandas: More efficient way to update a column in pandas dataframe without a for loop
I have a pandas dataframe in which I want to update the value of a column based on another column in the dataframe. 我有一个pandas数据框,我想在其中基于数据框中的另一列更新一列的值。 I was using the following code to update it before:
我之前使用以下代码对其进行了更新:
for i1, col1 in dfMod.iterrows():
if col1['day'] == "MONDAY":
dfMod.ix[i1,'weekIndex'] = 1
elif col1['day'] == "TUESDAY":
dfMod.ix[i1,'weekIndex'] = 2
elif col1['day'] == "WEDNESDAY":
dfMod.ix[i1,'weekIndex'] = 3
elif col1['day'] == "THURSDAY":
dfMod.ix[i1,'weekIndex'] = 4
elif col1['day'] == "FRIDAY":
dfMod.ix[i1,'weekIndex'] = 5
elif col1['day'] == "SATURDAY":
dfMod.ix[i1,'weekIndex'] = 6
else:
dfMod.ix[i1,'weekIndex'] = 7
However, the dataframe has 300,000 rows and takes forever to compile. 但是,数据帧具有300,000行,并且要花很长时间才能编译。 Is there a better way to update the column?
有没有更好的方法来更新列?
You need map
by dict
: 您需要按
dict
map
:
d = {"MONDAY": 1, "TUESDAY":2, "WEDNESDAY":3, "THURSDAY":4,
"FRIDAY":5, "SATURDAY":6, "SUNDAY":7}
dfMod["weekIndex"] = dfMod["day"].map(d)
Sample: 样品:
dfMod = pd.DataFrame({'day':['TUESDAY','THURSDAY','FRIDAY','SATURDAY','MONDAY','SUNDAY']})
d = {"MONDAY": 1, "TUESDAY":2, "WEDNESDAY":3,
"THURSDAY":4, "FRIDAY":5, "SATURDAY":6, "SUNDAY":7}
dfMod["weekIndex"] = dfMod["day"].map(d)
print (dfMod)
day weekIndex
0 TUESDAY 2
1 THURSDAY 4
2 FRIDAY 5
3 SATURDAY 6
4 MONDAY 1
5 SUNDAY 7
Timings in 300k
- map
is 6 times
faster as apply
solution: 时间为
300k
apply
解决方案的map
速度提高了6 times
:
dfMod = pd.DataFrame({'day':['TUESDAY','THURSDAY','FRIDAY','SATURDAY','MONDAY','SUNDAY']})
#300k rows
dfMod = pd.concat([dfMod]*50000).reset_index(drop=True)
d = {"MONDAY": 1, "TUESDAY":2, "WEDNESDAY":3, "THURSDAY":4,
"FRIDAY":5, "SATURDAY":6, "SUNDAY":7}
In [92]: %timeit dfMod["weekIndex"] = dfMod["day"].map(d)
10 loops, best of 3: 22.7 ms per loop
In [93]: %timeit dfMod["weekIndex1"] = dfMod["day"].apply(lambda x: d[x])
10 loops, best of 3: 141 ms per loop
Try the apply
method: 尝试
apply
方法:
daysOfWeek = {"MONDAY": 1, "TUESDAY":2, "WEDNESDAY":3, "THURSDAY":4, "FRIDAY":5, "SATURDAY":6, "SUNDAY":7}
dfMod["weekIndex"] = dfMod["day"].apply(lambda x: daysOfWeek[x])
Please use @jezrael's answer as it is idiomatic. 请使用@jezrael的答案,因为这是惯用的。
This is purely for demonstration and an attempt to provide useful information about other pandas tools that could be used. 这纯粹是为了演示,并试图提供有关可以使用的其他熊猫工具的有用信息。
setup 设定
using @jezrael's given example 使用@jezrael的示例
dfMod = pd.DataFrame({'day':['TUESDAY','THURSDAY','FRIDAY','SATURDAY','MONDAY','SUNDAY']})
d = {"MONDAY": 1, "TUESDAY":2, "WEDNESDAY":3,
"THURSDAY":4, "FRIDAY":5, "SATURDAY":6, "SUNDAY":7}
alternate solution 替代解决方案
dfMod.join(pd.Series(d, name='weekIndex'), on='day')
day weekIndex
0 TUESDAY 2
1 THURSDAY 4
2 FRIDAY 5
3 SATURDAY 6
4 MONDAY 1
5 SUNDAY 7
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.