[英]Is there a way to make changing DataFrame faster in a loop?
for index, row in df.iterrows():
print(index)
name = row['name']
new_name = get_name(name)
row['new_name'] = new_name
df.loc[index] = row
In this piece of code, my testing shows that the last line makes it quite slow, really slow.在这段代码中,我的测试表明最后一行让它变得非常慢,非常慢。 It basically insert a new column row by row.
它基本上逐行插入一个新列。 Maybe I should store all the 'new_name' into a list, and update the df outside of the loop?
也许我应该将所有“new_name”存储到一个列表中,并在循环之外更新 df?
Use Series.apply
for processing function for each value of column, it is faster like iterrows
:使用
Series.apply
为每个列值处理 function,它像iterrows
一样快:
df['new_name'] = df['name'].apply(get_name)
If want improve performance then is necessary change function if possible, but it depends of function.如果要提高性能,则有必要尽可能更改 function,但这取决于 function。
df['new_name'] = df.apply(lambda x: get_name(x) if x.name == 'name' else x)
.apply
isn't a best practice, however I am not sure there is a better one here. .apply
不是最佳做法,但我不确定这里是否有更好的做法。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.