简体   繁体   中英

Compare multiple columns within same row and highlight differences in pandas

I have a dataframe similar to:

NAME DB1 DB2 DB3 DB4
WORKFLOW_1 workflow1-1.jar workflow1-2.jar workflow1-1.jar workflow1-3.jar
WORKFLOW_2 workflow2-1.jar workflow2-1.jar workflow2-1.jar workflow2-1.jar
WORKFLOW_3 workflow3-2.jar workflow3-1.jar workflow3-1.jar workflow3-1.jar
WORKFLOW_4 workflow4-1.jar

Where NAME is the key for this table throughout n databases. I'm gathering data from an specific column and merging it side by side for further analysis.

My problem is that I need to highlight rows which contains different filenames between columns DB n .

I've tried the solution below:

        def highlight(row):

        for key1, column1 in row.items():
            if key1 != 'NAME':
                for key2, column2 in row.items():
                    if key2 != 'NAME':
                        if column1 != column2:
                            return ['background-color: red']
        return ['background-color: green']

        pd = pd.style.apply(highlight)

I tried to style the entire row when at least one filename is different from the others, but it did not work, when I export to excel, only the first line is red, which is not even one of the cases where it should happen.

The simplest (and naïve) approach is to use Series.eq to test each row against the first value. Setting an appropriate subset is very important here, as we only want to compare against other similar values.

def highlight_row(s: pd.Series) -> List[str]:
    bg_color = 'red'
    if s.eq(s[0]).all():
        bg_color = 'green'
    return [f'background-color:{bg_color}'] * len(s)


df.style.apply(
    func=highlight_row,
    subset=['DB1', 'DB2', 'DB3', 'DB4'],
    axis=1
)

具有朴素样式的样式表(在进行比较时考虑空字符串和 nan)


We can be a bit less naïve by excluding empty string and null values (and any other invalid values) from each row with a boolean indexing before doing the equality comparison with just the filtered array:

def highlight_row(s: pd.Series) -> List[str]:
    filtered_s = s[s.notnull() & ~s.eq('')]
    # Check for completely empty row (prevents index error from filtered_s[0])
    if filtered_s.empty:
        # No valid values in row
        css_str = ''
    elif filtered_s.eq(filtered_s[0]).all():
        # All values are the same
        css_str = 'background-color: green'
    else:
        # Row Values Differ
        css_str = 'background-color: red'
    return [css_str] * len(s)

We can also leverage an IndexSlice to more dynamically select the columns for the subset instead of manually passing a list of column names:

df.style.apply(
    func=highlight_row,
    subset=pd.IndexSlice[:, 'DB1':],
    axis=1
)

仅考虑相等比较的“有效”值的样式表


Lastly, it is possible to instead pass the idx/cols to the styling function instead of subsetting if wanting the entire row to be highlighted:

def highlight_row(s: pd.Series, idx: pd.IndexSlice) -> List[str]:
    css_str = 'background-color: red'
    # Filter Columns
    filtered_s = s[idx]
    # Filter Values
    filtered_s = filtered_s[filtered_s.notnull() & ~filtered_s.eq('')]
    # Check for completely empty row
    if filtered_s.empty:
        css_str = ''  # Empty row Styles
    elif filtered_s.eq(filtered_s[0]).all():
        css_str = 'background-color: green'
    return [css_str] * len(s)


df.style.apply(
    func=highlight_row,
    idx=pd.IndexSlice['DB1':],  # 1D IndexSlice!
    axis=1
)

带有整行突出显示的样式表


Setup and Imports:

from typing import List

import pandas as pd  # version 1.4.2

df = pd.DataFrame({
    'NAME': ['WORKFLOW_1', 'WORKFLOW_2', 'WORKFLOW_3', 'WORKFLOW_4'],
    'DB1': ['workflow1-1.jar', 'workflow2-1.jar', 'workflow3-2.jar', ''],
    'DB2': ['workflow1-2.jar', 'workflow2-1.jar', 'workflow3-1.jar',
            'workflow4-1.jar'],
    'DB3': ['workflow1-1.jar', 'workflow2-1.jar', 'workflow3-1.jar', ''],
    'DB4': ['workflow1-3.jar', 'workflow2-1.jar', 'workflow3-1.jar', '']
})

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM