简体   繁体   English

如果该行也出现在另一个 df 中,则突出显示 pandas df 中的一行

[英]Highlight a row in a pandas df if that row also appears in another df

I have two dataframes df1 and df2.我有两个数据帧 df1 和 df2。 I would like to highlight in yellow all the rows in df1 that are also present in df2.我想用黄色突出显示 df1 中也存在于 df2 中的所有行。

df1 df1

df2 df2

What I want to achive我想要达到的目标

So far I have only found solutions in which I insert another row and use a variable there to identify which row I have to colour.到目前为止,我只找到了插入另一行并在那里使用变量来标识我必须为哪一行着色的解决方案。

My question is whether it is possible to compare these two df directly in the function presented below.我的问题是下面介绍的function中是否可以直接比较这两个df。

So these are the two df's:所以这是两个 df:

df1 = pd.DataFrame([['AA',3,'hgend',1], ['BB','frdf',7,2], ['C1',4,'asef',4], ['C2',4,'asef',4], ['C3',4,'asef',4]], columns=list("ABCD"))
df2 = pd.DataFrame([['C1',4,'asef',4], ['C2',4,'asef',4], ['C3',4,'asef',4]], columns=list("XYZQ"))

This is my code to colour the rows:这是我为行着色的代码:

def highlight_rows(row):
    value = row.loc['A']
    if value == 'C1':
        color = 'yellow'
    else:
        color = ''
    return ['background-color: {}'.format(color) for r in row]

df1.style.apply(highlight_rows, axis=1)

As I said, if I do the comparison beforehand, insert another column and put a variable there, I can then search for this variable and highlight the row.正如我所说,如果我事先进行比较,插入另一列并在其中放置一个变量,然后我可以搜索该变量并突出显示该行。 My question is whether I can also do this directly in the function. To do this, I would have to be able to compare both df's in the function. Is this possible at all?我的问题是我是否也可以直接在 function 中执行此操作。为此,我必须能够比较 function 中的两个 df。这可能吗? It would be enough to be able to compare a single row, eg with.isin能够比较单行就足够了,例如 with.isin

Comparing to df2 inside the function would be inefficient.与 function 内部的df2相比效率低下。

You could define a temporary column to identify matches using a merge (the indicator column in_1 becomes left_only or both depending on whether or not the row is present in df2 ).您可以定义一个临时列以使用合并来标识匹配项(指示列in_1变为left_onlyboth取决于df2中是否存在该行)。 It is then ignored by the styler:然后它被样式器忽略:

def highlight_rows(row):
    highlight = 'yellow' if row['in_1'] == "both" else ""
    return ['background-color: {}'.format(highlight) for r in row]

(pd.merge(df1, df2.set_axis(df1.columns.tolist(), axis=1), 
          how="left", indicator="in_1")
    .style
    .hide_columns(['in_1'])
    .apply(highlight_rows, axis=1))

在此处输入图像描述


Alternatively, to actually do the comparison inside the function, define a set of tuples of df2 rows beforehand:或者,要在 function 中实际进行比较,请预先定义一组df2行的元组:

set_df2 = set(df2.apply(tuple, axis=1))

def highlight_rows(row):
    color = 'yellow' if tuple(row) in set_df2 else ""
    return [f'background-color: {color}'] * len(row)

df1.style.apply(highlight_rows, axis=1)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM