在 for 循环中使用 pandas.append

Question

I am appending rows to a pandas DataFrame within a for loop, but at the end the dataframe is always empty.我在 for 循环中将行附加到 pandas DataFrame 中，但最后 dataframe 始终为空。 I don't want to add the rows to an array and then call the DataFrame constructer, because my actual for loop handles lots of data.我不想将行添加到数组中，然后调用 DataFrame 构造函数，因为我的实际 for 循环处理大量数据。 I also tried pd.concat without success.我也试过pd.concat没有成功。 Could anyone highlight what I am missing to make the append statement work?谁能强调我缺少什么来使 append 语句起作用？ Here's a dummy example:这是一个虚拟示例：

import pandas as pd
import numpy as np

data = pd.DataFrame([])

for i in np.arange(0, 4):
    if i % 2 == 0:
        data.append(pd.DataFrame({'A': i, 'B': i + 1}, index=[0]), ignore_index=True)
    else:
        data.append(pd.DataFrame({'A': i}, index=[0]), ignore_index=True)

print data.head()

Empty DataFrame
Columns: []
Index: []
[Finished in 0.676s]

Answer 1

Every time you call append, Pandas returns a copy of the original dataframe plus your new row.每次调用 append 时，Pandas 都会返回原始数据帧的副本以及您的新行。 This is called quadratic copy, and it is an O(N^2) operation that will quickly become very slow (especially since you have lots of data).这称为二次复制，它是一个 O(N^2) 操作，很快就会变得非常慢（特别是因为您有大量数据）。

In your case, I would recommend using lists, appending to them, and then calling the dataframe constructor.在您的情况下，我建议使用列表，附加到它们，然后调用数据帧构造函数。

a_list = []
b_list = []
for data in my_data:
    a, b = process_data(data)
    a_list.append(a)
    b_list.append(b)
df = pd.DataFrame({'A': a_list, 'B': b_list})
del a_list, b_list

Timings时间安排

%%timeit
data = pd.DataFrame([])
for i in np.arange(0, 10000):
    if i % 2 == 0:
        data = data.append(pd.DataFrame({'A': i, 'B': i + 1}, index=[0]), ignore_index=True)
else:
    data = data.append(pd.DataFrame({'A': i}, index=[0]), ignore_index=True)
1 loops, best of 3: 6.8 s per loop

%%timeit
a_list = []
b_list = []
for i in np.arange(0, 10000):
    if i % 2 == 0:
        a_list.append(i)
        b_list.append(i + 1)
    else:
        a_list.append(i)
        b_list.append(None)
data = pd.DataFrame({'A': a_list, 'B': b_list})
100 loops, best of 3: 8.54 ms per loop

Answer 2

You need to set the the variable data equal to the appended data frame.您需要将变量data设置为等于附加的数据框。 Unlike the append method on a python list the pandas append does not happen in place与 python 列表上的append方法不同，pandas append不会就地发生

import pandas as pd
import numpy as np

data = pd.DataFrame([])

for i in np.arange(0, 4):
    if i % 2 == 0:
        data = data.append(pd.DataFrame({'A': i, 'B': i + 1}, index=[0]), ignore_index=True)
    else:
        data = data.append(pd.DataFrame({'A': i}, index=[0]), ignore_index=True)

print(data.head())

   A    B
0  0  1.0
1  2  3.0
2  3  NaN

NOTE: This answer aims to answer the question as it was posed.注意：此答案旨在回答提出的问题。 It is not however the optimal strategy for combining large numbers of dataframes.然而，这并不是组合大量数据帧的最佳策略。 For a more optimal solution have a look at Alexander's answer below有关更优化的解决方案，请查看下面亚历山大的回答

Answer 3

You can build your dataframe without a loop:您可以在没有循环的情况下构建数据框：

n = 4
data = pd.DataFrame({'A': np.arange(n)})
data['B'] = np.NaN
data.loc[data['A'] % 2 == 0, 'B'] = data['A'] + 1

For:对于：

n = 10000

This is a bit faster:这有点快：

%%timeit
data = pd.DataFrame({'A': np.arange(n)})
data['B'] = np.NaN
data.loc[data['A'] % 2 == 0, 'B'] = data['A'] + 1

100 loops, best of 3: 3.3 ms per loop

vs.对比

%%timeit
a_list = []
b_list = []
for i in np.arange(n):
    if i % 2 == 0:
        a_list.append(i)
        b_list.append(i + 1)
    else:
        a_list.append(i)
        b_list.append(None)
data1 = pd.DataFrame({'A': a_list, 'B': b_list})

100 loops, best of 3: 12.4 ms per loop

Answer 4

When you use data.append(pd.DataFrame[['1','2'],['3','4']], ignore_index=True) the result must be assigned back to a dataframe.当您使用 data.append(pd.DataFrame[['1','2'],['3','4']], ignore_index=True) 时，必须将结果分配回 dataframe。 The result will contain the collated data eg.结果将包含整理的数据，例如。

data = data.append(pd.DataFrame([['1','2'],['3','4']])) <= use this in the loop data = data.append(pd.DataFrame([['1','2'],['3','4']])) <= 在循环中使用这个

在 for 循环中使用 pandas.append

问题描述

4 个解决方案

解决方案1
51 2016-05-03 16:33:03

解决方案2
47 已采纳 2016-05-03 16:22:35

解决方案3
2 2016-05-03 18:58:13

解决方案4
0 2022-09-09 01:32:29

在 for 循环中使用 pandas.append

问题描述

4 个解决方案

解决方案1 51 2016-05-03 16:33:03

解决方案2 47 已采纳 2016-05-03 16:22:35

解决方案3 2 2016-05-03 18:58:13

解决方案4 0 2022-09-09 01:32:29

解决方案1
51 2016-05-03 16:33:03

解决方案2
47 已采纳 2016-05-03 16:22:35

解决方案3
2 2016-05-03 18:58:13

解决方案4
0 2022-09-09 01:32:29