有没有办法在循环中更快地更改 DataFrame？

Question

    for index, row in df.iterrows():
        print(index)

        name = row['name']
        new_name = get_name(name)
        row['new_name'] = new_name

        df.loc[index] = row

In this piece of code, my testing shows that the last line makes it quite slow, really slow.在这段代码中，我的测试表明最后一行让它变得非常慢，非常慢。 It basically insert a new column row by row.它基本上逐行插入一个新列。 Maybe I should store all the 'new_name' into a list, and update the df outside of the loop?也许我应该将所有“new_name”存储到一个列表中，并在循环之外更新 df？

Answer 1

Use Series.apply for processing function for each value of column, it is faster like iterrows :使用Series.apply为每个列值处理 function，它像iterrows一样快：

df['new_name'] = df['name'].apply(get_name)

If want improve performance then is necessary change function if possible, but it depends of function.如果要提高性能，则有必要尽可能更改 function，但这取决于 function。

Answer 2

df['new_name'] = df.apply(lambda x: get_name(x) if x.name == 'name' else x)

.apply isn't a best practice, however I am not sure there is a better one here. .apply不是最佳做法，但我不确定这里是否有更好的做法。

有没有办法在循环中更快地更改 DataFrame？

问题描述

2 个解决方案

解决方案1
1 已采纳 2020-05-04 08:44:59

解决方案2
0 2020-05-04 08:43:10

有没有办法在循环中更快地更改 DataFrame？

问题描述

2 个解决方案

解决方案1 1 已采纳 2020-05-04 08:44:59

解决方案2 0 2020-05-04 08:43:10

解决方案1
1 已采纳 2020-05-04 08:44:59

解决方案2
0 2020-05-04 08:43:10