I'm using pandas
, and I perform some calculations and transformations, where I end up with two data frames that look more or less like this:
ID 'abc' 'def'
Total 4 5
Slow 0 0
Normal 1 2
Fast 3 3
ID 'abc' 'def'
Total 3 4
Slow 0 0
Normal 0 1
Fast 3 3
Now, given these two data frames, I want to generate a third data frame, that somehow returns how much percent of the first data frame the second one fulfills. Such that I Want the results to be like this:
ID 'abc' 'dfe'
Total 75.0% 80.0%
Slow None None
Normal 0.0% 50.0%
Fast 100.0% 100.0%
If there is a 0 in the first data frame, then in the resultant data frame we set that cell to None
or something else. The whole idea is that at the end I will write the results to an Excel file, so I want the cells that have None
to be empty in Excel. Any ideas how to do this in Python using pandas
?
You can simply divide df2
by df1
on the columns of interest:
df2.loc[:,"'abc'":] = df2.loc[:,"'abc'":].div(df1.loc[:,"'abc'":]).mul(100)
ID 'abc' 'dfe'
0 Total 75.0 80.0
1 Slow NaN NaN
2 Normal 0.0 50.0
3 Fast 100.0 100.0
In order to format as specified, you can do:
df2.loc[:,"'abc'":] = df2.where(df2.loc[:,"'abc'":].isna(),
df2.round(2).astype(str).add('%'))
ID 'abc' 'dfe'
0 Total 75.0% 80.0%
1 Slow NaN NaN
2 Normal 0.0% 50.0%
3 Fast 100.0% 100.0%
Given that there are no decimal places, other than .0
, round(2)
has no effect on the displayed floats, however as soon as there is some float with more decimal places after having divided, you will see the 2
decimal positions for all floats.
Pandas offers some possibilities for directly specifying styling in the output excel file . It's limited, but fortunately for you does include a number-format option.
import pandas as pd
# Initialize example dataframes
df1 = pd.DataFrame(
data=[[4, 5], [0, 0], [1, 2], [3, 3], [3, 3]],
index=['Total', 'Slow', 'Normal', 'Fast', 'Fast'],
columns=['abc', 'def'],
)
df2 = pd.DataFrame(
data=[[3, 4], [0, 0], [0, 1], [3, 3], [3, 3]],
index=['Total', 'Slow', 'Normal', 'Fast', 'Fast'],
columns=['abc', 'def'],
)
result_df = df2 / df1
# Change rows index into data column (to avoid any chance of having non-unique row index values,
# since the pandas styler can only handle unique row index)
result_df = result_df.reset_index()
# Write excel output file with number format styling applied
result_df.style.applymap(lambda _: 'number-format: 0.00%').to_excel('result.xlsx', engine='openpyxl', index=False)
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.