I want to highlight certain words in the data frame. My codes are given below, the problem I am having is that it highlights only the first words from the "selected_ text" such an economy in this case, and not able to highlight other words even though they are present in the text. How can we change it in a way to highlight all words if present in the text? Second, currently, I get only one words under the "existing" column. Can we get more than one word if they are present in the column "body_text"? and highlight them as well.
import pandas as pd
from IPython.display import display, Markdown, Latex, HTML
df =pd.read_csv("/content/df_eup.csv")
df.head(1)
df['body_text'].isnull().sum()
df.dropna(subset=['body_text'], inplace=True)
list_exist = []
selected_words=["economy", "recession", "unemployment", "depression","inflation", "covid19","virus"," bank"]
for index, row in df.iterrows():
word = selected_words[0]
i = 0
while (word not in row['body_text'] and i < 7 ):
i +=1
word = selected_words[i]
if i<7:
list_exist.append(selected_words[i])
else:
list_exist.append("not_exist")
df["existing"]=list_exist
def highlight_selected_text(row):
text = row["body_text"]
selected_text = ["economy", "recession", "unemployment", "depression","inflation", "covid19","virus","bank"]
ext = row["existing"]
color = {
"economy": "red",
"recession": "red",
"unemployment": "red",
"depression": "red",
"inflation": "red",
"covid19": "red",
"virus" : "red",
"bank": "red",
"not_exist": "black"
}[ext]
highlighted = f'<span style="color: {color}; font-weight: bold">{ext}</span>'
return text.replace(selected_text[0] or selected_text[1] or selected_text[2] or selected_text[3] or selected_text[4]or selected_text[5]or selected_text[6]or selected_text[7], highlighted)
df["highlighted"] = df.apply(highlight_selected_text, axis=1)
display(HTML(df.sample(30).to_html(escape=False)))
Sample output for the selection of more than words (For the second part of the question)
Try to retrieve dict value inside of f string:
def highlight_selected_text(row):
text = row["body_text"]
ext = row["existing"]
color = {
"economy": "red",
"recession": "red",
"unemployment": "red",
"depression": "red",
"inflation": "red",
"covid19": "red",
"virus" : "red",
"bank": "red",
"not_exist": "black"
}
for k, v in color.items():
text = text.replace(k, f'<span style="color: {v}; font-weight: bold">{k}</span>')
return text
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.