简体   繁体   English

根据 Python pandas 中的列数据比较两个数据帧

[英]Compare two dataframes based on column data in Python pandas

I have two dataframes, df1 and df2, and I would like to substruct the df2 from df1 and using as a row comparison a specific column, 'Code'我有两个数据框,df1 和 df2,我想从 df1 子结构 df2 并将特定列“代码”用作行比较

import pandas as pd
import numpy as np
rng = pd.date_range('2021-01-01', periods=10, freq='D')
df1 = pd.DataFrame(index=rng, data={'Val1': range(10), 'Val2': np.array(range(10))*5, 'Code': [1, 1, 1, 2, 2, 2, 3, 3, 3, 3]})

df2 = pd.DataFrame(data={'Code': [1, 2, 3, 4], 'Val1': [10, 5, 15, 20], 'Val2': [4, 8, 10, 7]})

df1: df1:

            Val1  Val2  Code
2021-01-01     0     0     1
2021-01-02     1     5     1
2021-01-03     2    10     1
2021-01-04     3    15     2
2021-01-05     4    20     2
2021-01-06     5    25     2
2021-01-07     6    30     3
2021-01-08     7    35     3
2021-01-09     8    40     3
2021-01-10     9    45     3

df2: df2:

   Code  Val1  Val2
0     1    10     4
1     2     5     8
2     3    15    10
3     4    20     7

I using the following code:我使用以下代码:

df = (df1.set_index(['Code']) - df2.set_index(['Code']))

and the result is结果是

Code            
1    -10.0  -4.0
1     -9.0   1.0
1     -8.0   6.0
2     -2.0   7.0
2     -1.0  12.0
2      0.0  17.0
3     -9.0  20.0
3     -8.0  25.0
3     -7.0  30.0
3     -6.0  35.0
4      NaN   NaN

However, I only want to get the results for the rows that are in df1 and not the missing keys, in this example the 4.但是,我只想获取 df1 中的行的结果,而不是缺少的键,在本例中为 4。

How do I do it and then to set back the index to the original from df1?我该怎么做,然后将索引从 df1 设置回原始索引?

Something like that but it doesn't work:类似的东西,但它不起作用:

df = (df1.set_index(['Code']) - df2.set_index(['Code'])).set_index(df1['Code'])

Also I would like to keep the headers of the columns.我也想保留列的标题。

Desired output:所需的 output:

            Val1  Val2  Code
Date                        
2021-01-01 -10.0  -4.0     1
2021-01-02  -9.0   1.0     1
2021-01-03  -8.0   6.0     1
2021-01-04  -2.0   7.0     2
2021-01-05  -1.0  12.0     2
2021-01-06   0.0  17.0     2
2021-01-07  -9.0  20.0     3
2021-01-08  -8.0  25.0     3
2021-01-09  -7.0  30.0     3
2021-01-10  -6.0  35.0     3

If you want to get the results for the rows that are in df1 and not the missing keys, in this example the 4 then just use drop_na() method如果要获取 df1 中的行而不是缺少的键的结果,在此示例中,4 然后只需使用drop_na()方法

df = (df1.set_index(['Code']) - df2.set_index(['Code'])).dropna()

then:-然后:-

df.insert(0,'Date',df1.index)

And Finally:-最后:-

df.reset_index(inplace=True)
df.set_index('Date',inplace=True)

Now if you print df you will get your desired output:-现在,如果您打印df ,您将获得所需的 output:-

           Code  Val1   Val2
Date            
2021-01-01  1   -10.0   -4.0
2021-01-02  1   -9.0    1.0
2021-01-03  1   -8.0    6.0
2021-01-04  2   -2.0    7.0
2021-01-05  2   -1.0    12.0
2021-01-06  2   0.0     17.0
2021-01-07  3   -9.0    20.0
2021-01-08  3   -8.0    25.0
2021-01-09  3   -7.0    30.0
2021-01-10  3   -6.0    35.0

Note:-In case this is not your desired output then let me know注意:-如果这不是您想要的 output,请告诉我

You can use reindex to align df2 to df1["code"] .您可以使用reindexdf2df1["code"]对齐。 Then we can take the underlying numpy ndarray and subtract that inplace from the corresponding columns df1 .然后我们可以采用底层 numpy ndarray 并从相应的列df1中减去它。 This will leave both the index and the "code" column untouched and perform subtraction as expected.这将使索引和“代码”列保持不变并按预期执行减法。

subtract_values = df2.set_index("Code").reindex(df1["Code"]).to_numpy()
df1[["Val1", "Val2"]] -= subtract_values

print(df1)
            Val1  Val2  Code
2021-01-01   -10    -4     1
2021-01-02    -9     1     1
2021-01-03    -8     6     1
2021-01-04    -2     7     2
2021-01-05    -1    12     2
2021-01-06     0    17     2
2021-01-07    -9    20     3
2021-01-08    -8    25     3
2021-01-09    -7    30     3
2021-01-10    -6    35     3

If you don't want to change df1 , you can copy the data to a new DataFrame via new_df = df1.copy() and proceeding with new_df instead of df1如果您不想更改df1 ,可以通过new_df = df1.copy()将数据复制到新的DataFrame并继续使用new_df而不是df1

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM