[英]python Pandas DataFrame copy(deep=False) vs copy(deep=True) vs '='
[英]Pandas copy is changing the original dataframe, even with copy(deep=True)
我有以下代码试图从 1 个数据帧创建 2 个单独的表。 这些表应用了不同的过滤器。 我发现的是,一旦应用了第一个过滤器,原始数据框就会“改变”。
df_orig = pd.read_excel('JRMaster.xlsm')
df_orig.columns = map(str.upper, df_orig.columns)
df_orig['SYSTEM'] = df_orig['SYSTEM'].str.upper()
df_orig['STATUS'] = df_orig['STATUS'].str.upper()
df = df_orig.copy(deep=True)
df_copy_all = df_orig.copy(deep=True)
df = df[(df['DATE PAID'].dt.month.between(10,10)) & (df['DATE PAID'].dt.year == 2020)]
df2 = df_copy_all[(df_copy_all['DATE SENT'].dt.month.between(10,10)) & (df['DATE SENT'].dt.year == 2020)]
df 和 df2 应该有 2 个不同的结果,但输出是相同的。 我试过 df.copy() 和 df.copy(deep=True)
使用 Pandas 1.0.5 和 Python 3.6
一些论坛指出这是一个错误,但我想检查是否有解决方法或修复此问题。
我想到的另一种方法是将原始 excel 文档读入多个数据帧,但这似乎不可持续且资源繁重。
编辑:
示例数据如下:
System DATE SENT STATUS DATE PAID
0 One 2020-10-01 OPEN NaT
1 One 2020-10-01 OPEN NaT
2 THREE 2020-10-01 SR 2020-10-07
3 One 2020-10-01 DUP NaT
4 One 2020-10-01 OPEN NaT
5 One 2020-10-01 OPEN NaT
6 THREE 2020-10-01 OPEN NaT
7 One 2020-10-01 DUP NaT
8 THREE 2020-10-01 AR 2020-07-31
9 THREE 2020-10-01 OPEN NaT
10 One 2020-10-01 AR 2020-08-21
11 One 2020-10-01 DUP NaT
12 One 2020-10-01 OPEN NaT
13 One 2020-10-01 DUP NaT
14 One 2020-10-01 DUP NaT
15 One 2020-10-01 DUP NaT
16 One 2020-10-01 DUP NaT
17 THREE 2020-10-01 OPEN NaT
18 One 2020-10-01 OPEN NaT
19 One 2020-10-01 OPEN NaT
看起来deepcopy
不适用于pandas
。
问题实际上是一个错字:
df2 = df_copy_all[(df_copy_all['DATE SENT'].dt.month.between(10,10)) & (df['DATE SENT'].dt.year == 2020)]
应该
df2 = df_copy_all[(df_copy_all['DATE SENT'].dt.month.between(10,10)) & (df2['DATE SENT'].dt.year == 2020)]
错误在:df2['DATE SENT'],我有 df['DATE SENT']
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.