I have a dataframe similar to:
NAME | DB1 | DB2 | DB3 | DB4 |
---|---|---|---|---|
WORKFLOW_1 | workflow1-1.jar | workflow1-2.jar | workflow1-1.jar | workflow1-3.jar |
WORKFLOW_2 | workflow2-1.jar | workflow2-1.jar | workflow2-1.jar | workflow2-1.jar |
WORKFLOW_3 | workflow3-2.jar | workflow3-1.jar | workflow3-1.jar | workflow3-1.jar |
WORKFLOW_4 | workflow4-1.jar |
Where NAME is the key for this table throughout n databases. I'm gathering data from an specific column and merging it side by side for further analysis.
My problem is that I need to highlight rows which contains different filenames between columns DB n .
I've tried the solution below:
def highlight(row):
for key1, column1 in row.items():
if key1 != 'NAME':
for key2, column2 in row.items():
if key2 != 'NAME':
if column1 != column2:
return ['background-color: red']
return ['background-color: green']
pd = pd.style.apply(highlight)
I tried to style the entire row when at least one filename is different from the others, but it did not work, when I export to excel, only the first line is red, which is not even one of the cases where it should happen.
The simplest (and naïve) approach is to use Series.eq to test each row against the first value. Setting an appropriate subset
is very important here, as we only want to compare against other similar values.
def highlight_row(s: pd.Series) -> List[str]:
bg_color = 'red'
if s.eq(s[0]).all():
bg_color = 'green'
return [f'background-color:{bg_color}'] * len(s)
df.style.apply(
func=highlight_row,
subset=['DB1', 'DB2', 'DB3', 'DB4'],
axis=1
)
We can be a bit less naïve by excluding empty string and null values (and any other invalid values) from each row with a boolean indexing before doing the equality comparison with just the filtered array:
def highlight_row(s: pd.Series) -> List[str]:
filtered_s = s[s.notnull() & ~s.eq('')]
# Check for completely empty row (prevents index error from filtered_s[0])
if filtered_s.empty:
# No valid values in row
css_str = ''
elif filtered_s.eq(filtered_s[0]).all():
# All values are the same
css_str = 'background-color: green'
else:
# Row Values Differ
css_str = 'background-color: red'
return [css_str] * len(s)
We can also leverage an IndexSlice to more dynamically select the columns for the subset
instead of manually passing a list of column names:
df.style.apply(
func=highlight_row,
subset=pd.IndexSlice[:, 'DB1':],
axis=1
)
Lastly, it is possible to instead pass the idx/cols to the styling function instead of subsetting if wanting the entire row to be highlighted:
def highlight_row(s: pd.Series, idx: pd.IndexSlice) -> List[str]:
css_str = 'background-color: red'
# Filter Columns
filtered_s = s[idx]
# Filter Values
filtered_s = filtered_s[filtered_s.notnull() & ~filtered_s.eq('')]
# Check for completely empty row
if filtered_s.empty:
css_str = '' # Empty row Styles
elif filtered_s.eq(filtered_s[0]).all():
css_str = 'background-color: green'
return [css_str] * len(s)
df.style.apply(
func=highlight_row,
idx=pd.IndexSlice['DB1':], # 1D IndexSlice!
axis=1
)
Setup and Imports:
from typing import List
import pandas as pd # version 1.4.2
df = pd.DataFrame({
'NAME': ['WORKFLOW_1', 'WORKFLOW_2', 'WORKFLOW_3', 'WORKFLOW_4'],
'DB1': ['workflow1-1.jar', 'workflow2-1.jar', 'workflow3-2.jar', ''],
'DB2': ['workflow1-2.jar', 'workflow2-1.jar', 'workflow3-1.jar',
'workflow4-1.jar'],
'DB3': ['workflow1-1.jar', 'workflow2-1.jar', 'workflow3-1.jar', ''],
'DB4': ['workflow1-3.jar', 'workflow2-1.jar', 'workflow3-1.jar', '']
})
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.