熊猫数据框的功能和副作用

Question

我想编写一个函数，将Pandas数据框作为输入，并仅返回平均值大于某些指定阈值的行。 该函数有效，但是它具有更改输入的副作用，而我不想这样做。

def Remove_Low_Average(df, sample_names, average_threshold=30):
    data_frame = df
    data_frame['Mean'] = np.mean(data_frame[sample_names], axis=1)
    data_frame = data_frame[data_frame.Mean > 30]
    return data_frame.reset_index(drop=True)

例：

In [7]: junk_data = DataFrame(np.random.randn(5,5), columns=['a', 'b', 'c', 'd', 'e'])
In [8]: Remove_Low_Average(junk_data, ['a', 'b', 'c'], average_threshold=0)
In [9]: junk_data.columns
Out[9]: Index([u'a', u'b', u'c', u'd', u'e', u'Mean'], dtype='object')

因此，即使在函数中从未分配过junk_data，现在其栏仍具有“均值”。 我意识到我可以用一种更简单的方式做到这一点，但这说明了我经常遇到的一个问题，我不知道为什么。 我认为这必须是众所周知的事情，但是我不知道如何避免这种副作用的发生。

编辑：下面的EdChum的链接回答了这个问题。

Answer 1

@EdChum在评论中回答了这个问题：

如果您想避免修改原始内容，则基本上可以看到此页面，然后通过调用.copy（）进行深度复制

Answer 2

您不需要复制旧的数据框，只需要分配一个新列即可：)

def remove_low_average(df, sample_names, average_threshold=30):
    mean = df[sample_names].mean(axis=1)
    return df.ix[mean > average_threshold]

# then use it as:
df = remove_low_average(df, ['a', 'b'])

熊猫数据框的功能和副作用

问题描述

2 个解决方案

解决方案1
0 已采纳 2014-07-15 21:06:21

解决方案2
0 2014-07-15 23:04:30

熊猫数据框的功能和副作用

问题描述

2 个解决方案

解决方案1 0 已采纳 2014-07-15 21:06:21

解决方案2 0 2014-07-15 23:04:30

解决方案1
0 已采纳 2014-07-15 21:06:21

解决方案2
0 2014-07-15 23:04:30