简体   繁体   中英

Subtracting values of attributes within one Pandas Dataframe from another dataframe

This problem contains 3 separate dataframes. df1 represents the 'Total' of products 1,2,3, containing 'value1', 'value2' df2 represents the 'Customer1' of products 1,2,3, containing 'value1', 'value2' df3 represents the 'Customer2' of products 1,2,3, containing 'value1', 'value2'

df2 & df3 are essentially subsets of df1.

I would like to create another dataframe that subtracts df2&df3 from df1 and label this df4. I want df4 to be 'remaining customers' within the 'Market' Column.

This is what I have done so far

import pandas as pd


d1 = {'Market': ['Total', 'Total','Total'], 'Product Code': [1, 2, 3], 
'Value1':[10, 20, 30], 'Value2':[5, 15, 25]}
df1 = pd.DataFrame(data=d1)
df1



d2 = {'Market': ['Customer1', 'Customer1','Customer1'], 'Product Code': [1, 
2, 3], 'Value1':[3, 14, 10], 'Value2':[2, 4, 6]}
df2 = pd.DataFrame(data=d2)
df2


d3 = {'Market': ['Customer2', 'Customer2','Customer2'], 'Product Code': [1, 
2, 3], 'Value1':[3, 3, 4], 'Value2':[2, 6, 10]}
df3 = pd.DataFrame(data=d3)
df3

This produces the following result..

Market  Product Code  Value1  Value2
0  Total             1      10       5
1  Total             2      20      15
2  Total             3      30      25
  Market  Product Code  Value1  Value2
0  Customer1             1       3       2
1  Customer1             2      14       4
2  Customer1             3      10       6
  Market  Product Code  Value1  Value2
0  Customer2             1       3       2
1  Customer2             2       3       6
2  Customer2             3       4      10

To create df4, I try the following code and get an error 'TypeError: unsupported operand type(s) for -: 'str' and 'str'' Can anyone help?

df4 = df1-(df2+df3)

print(df4)

Drop Market , set Product Code as the index, and perform index aligned arithmetic on the product codes. After that, just reset the index and insert Market into the result.

df1, df2, df3 = [
      df.drop('Market', 1).set_index('Product Code') for df in [df1, df2, df3]
]

df4 = (df1 - (df2 + df3)).reset_index()
df4.insert(0, 'Market', 'RemainingCustomers')

               Market  Product Code  Value1  Value2
0  RemainingCustomers             1       4       1
1  RemainingCustomers             2       3       5
2  RemainingCustomers             3      16       9

Not exactly what OP asked for but it seemed to me that this might be a better way to manage the data.

df = pd.concat([df1, df2, df3]).set_index(['Product Code', 'Market'])

formula = 'RemainingCustomers = Total - Customer1 - Customer2'
df = df.unstack().stack(0).eval(formula).unstack()
df

Market       Customer1        Customer2         Total        RemainingCustomers       
                Value1 Value2    Value1 Value2 Value1 Value2             Value1 Value2
Product Code                                                                          
1                    3      2         3      2     10      5                  4      1
2                   14      4         3      6     20     15                  3      5
3                   10      6         4     10     30     25                 16      9

And

df['RemainingCustomers']

              Value1  Value2
Product Code                
1                  4       1
2                  3       5
3                 16       9

If we insist on the requested output

df.stack(0).reset_index().query(
    'Market == "RemainingCustomers"').reindex(columns=df1.columns)

                Market  Product Code  Value1  Value2
2   RemainingCustomers             1       4       1
6   RemainingCustomers             2       3       5
10  RemainingCustomers             3      16       9

Or

df.stack(0).xs(
    'RemainingCustomers', level=1, drop_level=False
).reset_index().reindex(columns=df1.columns)

               Market  Product Code  Value1  Value2
0  RemainingCustomers             1       4       1
1  RemainingCustomers             2       3       5
2  RemainingCustomers             3      16       9

Here is one way:

cols = ['Value1', 'Value2']
df4 = df1[cols].subtract(df2[cols].add(df3[cols]))\
               .assign(**{'Market': 'RemainingCustomers', 'Product Code': [1, 2, 3]})\
               .sort_index(axis=1)

#                Market  Product Code  Value1  Value2
# 0  RemainingCustomers             1       4       1
# 1  RemainingCustomers             2       3       5
# 2  RemainingCustomers             3      16       9

Explanation

  • df1[cols].subtract(df2[cols].add(df3[cols])) performs calculation on specified columns only.
  • assign(**{'Market': 'RemainingCustomers', 'Product Code': [1, 2, 3]}) adds extra columns required for results dataframe.
  • sort_index(axis=1) reorders columns for desired output.

Maybe we can use select_dtypes

(df1.select_dtypes(exclude = 'object')
     -df2.select_dtypes(exclude = 'object')
       -df3.select_dtypes(exclude = 'object')).\
            drop('Product Code',1).\
              combine_first(df1).\
               assign(Market='remaining customers')
Out[133]: 
                Market  Product Code  Value1  Value2
0  remaining customers           1.0       4       1
1  remaining customers           2.0       3       5
2  remaining customers           3.0      16       9

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM