简体   繁体   中英

highlight certain words in the data frame Pandas HTML

I want to highlight certain words in the data frame. My codes are given below, the problem I am having is that it highlights only the first words from the "selected_ text" such an economy in this case, and not able to highlight other words even though they are present in the text. How can we change it in a way to highlight all words if present in the text? Second, currently, I get only one words under the "existing" column. Can we get more than one word if they are present in the column "body_text"? and highlight them as well.

import pandas as pd
from IPython.display import display, Markdown, Latex, HTML
df =pd.read_csv("/content/df_eup.csv")
df.head(1)

在此处输入图像描述

df['body_text'].isnull().sum()
df.dropna(subset=['body_text'], inplace=True)
list_exist = []
selected_words=["economy", "recession", "unemployment", "depression","inflation", "covid19","virus"," bank"]
for index, row in df.iterrows():
    word = selected_words[0]
    i = 0
    while (word not in row['body_text'] and i < 7 ):
        
        i +=1
        word = selected_words[i]
    if i<7:
        list_exist.append(selected_words[i])
    else:
        list_exist.append("not_exist")
df["existing"]=list_exist

def highlight_selected_text(row):
    text = row["body_text"]
    selected_text = ["economy", "recession", "unemployment", "depression","inflation", "covid19","virus","bank"]
    ext = row["existing"]

    color = {
        "economy": "red",
        "recession": "red",
        "unemployment": "red",
        "depression": "red",
        "inflation": "red",
        "covid19": "red",
        "virus" : "red",
        "bank": "red",
        "not_exist": "black"
        
    }[ext]

    highlighted = f'<span style="color: {color}; font-weight: bold">{ext}</span>'
    
    
    
    return text.replace(selected_text[0] or selected_text[1] or selected_text[2] or selected_text[3] or selected_text[4]or selected_text[5]or selected_text[6]or selected_text[7], highlighted)
df["highlighted"] = df.apply(highlight_selected_text, axis=1)


display(HTML(df.sample(30).to_html(escape=False)))

Sample output for the selection of more than words (For the second part of the question) 在此处输入图像描述

Try to retrieve dict value inside of f string:

def highlight_selected_text(row):
    text = row["body_text"]
    ext = row["existing"]
    color = {
        "economy": "red",
        "recession": "red",
        "unemployment": "red",
        "depression": "red",
        "inflation": "red",
        "covid19": "red",
        "virus" : "red",
        "bank": "red",
        "not_exist": "black"
    }

    for k, v in color.items():
        text = text.replace(k, f'<span style="color: {v}; font-weight: bold">{k}</span>')

    return text

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM