简体   繁体   中英

Divide two pandas DataFrames and keep non-numeric columns

I have two pandas DataFrames that contain numeric and non-numeric values. I want to divide one by the other, but keep the non-numeric columns. Here is a MWE:

a = pd.DataFrame(
    [
        ['group1', 1., 2.],
        ['group1', 3., 4.],
        ['group1', 5., 6.]
    ], 
    columns=['Group', 'A', 'B']
)

b = pd.DataFrame(
    [
        ['group1', 7., 8.],
        ['group1', 9., 10.],
        ['group1', 11., 12.]
    ],
    columns=['Group', 'A', 'B']
)

Trying to do:

b.div(a)

Results in:

TypeError: unsupported operand type(s) for /: 'str' and 'str'

So to get around this, I have done:

result = b.drop(["Group"], axis=1).div(a.drop(["Group"], axis=1))
print(result)
#     A    B
#0  7.0  4.0
#1  3.0  2.5
#2  2.2  2.0

Which is correct, but I also wanted to keep the column "Group" .

One way to get my desired output would be to do:

desired_output = b[["Group"]].join(result)
print(desired_output)
#    Group    A    B
#0  group1  7.0  4.0
#1  group1  3.0  2.5
#2  group1  2.2  2.0

But my real DataFrames have many non-numeric columns. Is there a cleaner/faster/more efficient way to tell pandas to divide only the numeric columns?

You can use np.divide , passing a mask to the where parameter.

np.divide(b, a, where=a.dtypes.ne(object))

Assuming the non-numeric columns are the same across DataFrames, use combine_first / fillna to get them back:

np.divide(b, a, where=a.dtypes.ne(object)).combine_first(a)


    Group    A    B
0  group1  7.0  4.0
1  group1  3.0  2.5
2  group1  2.2  2.0

Similar to @cᴏʟᴅsᴘᴇᴇᴅ's answer, but you can stay within Pandas with .select_dtypes() . This will attempt to do index-aligned division on any non-object dtypes.

>>> b.select_dtypes(exclude='object').div(
...     a.select_dtypes(exclude='object')).combine_first(a)
...     
     A    B   Group
0  7.0  4.0  group1
1  3.0  2.5  group1
2  2.2  2.0  group1

To retain column ordering:

>>> desired_output = b.select_dtypes(exclude='object')\
...     .div(a.select_dtypes(exclude='object'))\
...     .combine_first(a)[a.columns]

>>> desired_output
    Group    A    B
0  group1  7.0  4.0
1  group1  3.0  2.5
2  group1  2.2  2.0

Maybe set_index()

b.set_index('Group').div(a.set_index('Group'),level=[0]).reset_index()
Out[579]: 
    Group    A    B
0  group1  7.0  4.0
1  group1  3.0  2.5
2  group1  2.2  2.0

Work for more string type columns

pd.concat([b,a]).groupby(level=0).agg(lambda x : x.iloc[0]/x.iloc[1] if x.dtype=='int64' else x.head(1))
Out[584]: 
    Group     A     B
0  group1   7.0   8.0
1  group1   9.0  10.0
2  group1  11.0  12.0

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM