I have two pandas DataFrames that contain numeric and non-numeric values. I want to divide one by the other, but keep the non-numeric columns. Here is a MWE:
a = pd.DataFrame(
[
['group1', 1., 2.],
['group1', 3., 4.],
['group1', 5., 6.]
],
columns=['Group', 'A', 'B']
)
b = pd.DataFrame(
[
['group1', 7., 8.],
['group1', 9., 10.],
['group1', 11., 12.]
],
columns=['Group', 'A', 'B']
)
Trying to do:
b.div(a)
Results in:
TypeError: unsupported operand type(s) for /: 'str' and 'str'
So to get around this, I have done:
result = b.drop(["Group"], axis=1).div(a.drop(["Group"], axis=1))
print(result)
# A B
#0 7.0 4.0
#1 3.0 2.5
#2 2.2 2.0
Which is correct, but I also wanted to keep the column "Group"
.
One way to get my desired output would be to do:
desired_output = b[["Group"]].join(result)
print(desired_output)
# Group A B
#0 group1 7.0 4.0
#1 group1 3.0 2.5
#2 group1 2.2 2.0
But my real DataFrames have many non-numeric columns. Is there a cleaner/faster/more efficient way to tell pandas to divide only the numeric columns?
You can use np.divide
, passing a mask to the where
parameter.
np.divide(b, a, where=a.dtypes.ne(object))
Assuming the non-numeric columns are the same across DataFrames, use combine_first
/ fillna
to get them back:
np.divide(b, a, where=a.dtypes.ne(object)).combine_first(a)
Group A B
0 group1 7.0 4.0
1 group1 3.0 2.5
2 group1 2.2 2.0
Similar to @cᴏʟᴅsᴘᴇᴇᴅ's answer, but you can stay within Pandas with .select_dtypes()
. This will attempt to do index-aligned division on any non-object dtypes.
>>> b.select_dtypes(exclude='object').div(
... a.select_dtypes(exclude='object')).combine_first(a)
...
A B Group
0 7.0 4.0 group1
1 3.0 2.5 group1
2 2.2 2.0 group1
To retain column ordering:
>>> desired_output = b.select_dtypes(exclude='object')\
... .div(a.select_dtypes(exclude='object'))\
... .combine_first(a)[a.columns]
>>> desired_output
Group A B
0 group1 7.0 4.0
1 group1 3.0 2.5
2 group1 2.2 2.0
Maybe set_index()
b.set_index('Group').div(a.set_index('Group'),level=[0]).reset_index()
Out[579]:
Group A B
0 group1 7.0 4.0
1 group1 3.0 2.5
2 group1 2.2 2.0
Work for more string type columns
pd.concat([b,a]).groupby(level=0).agg(lambda x : x.iloc[0]/x.iloc[1] if x.dtype=='int64' else x.head(1))
Out[584]:
Group A B
0 group1 7.0 8.0
1 group1 9.0 10.0
2 group1 11.0 12.0
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.