简体   繁体   English

基于文本的颜色词 python:一个优雅的解决方案

[英]Color words in text based python: an elegant solution

Python challenge of the day. Python 今日挑战。 I am reading an input file with a formatted text (spaces, new lines, ponctuation).我正在读取带有格式化文本(空格、换行符、标点符号)的输入文件。 I would like to preserve the text as it is, while highlighting certain words based on a some condition.我想按原样保留文本,同时根据某些条件突出显示某些单词。

Then console print the text with color-highlighted words in it.然后控制台打印带有颜色突出显示的单词的文本。

Code here, first the set of words which should be highlighted.代码在这里,首先是应该突出显示的单词集。

diff=set(g_word_counts.keys()).difference(set(t_word_index.keys()))

To compare words in text with this set I lower() it, this gives为了将文本中的单词与这个集合进行比较,我降低了它,这给出了

colored_text=""
for t in generated_text.lower().split():
    if t in diff:
        colored_text+=colored(t, 'green')
    else:
        colored_text+=t
    colored_text+=" "
    
print(colored_text) 

where the result has obviously everything lower case which is not exactly nice.结果显然都是小写的,这不太好。 Additionally, I would like to split not only on white space, but also any punctuation character, where I try according to Splitting the sentences in python此外,我不仅想拆分空格,还想拆分任何标点符号,我尝试根据拆分句子 python

import re

def to_words(text):
    return re.findall(r'\w+', text)

but here again it will lowercase everything and reconstruct the text without its punctuation.但在这里它又会小写所有内容并在没有标点符号的情况下重建文本。

An elegant efficient manner to keep the formatting unchanged, color and print?保持格式、颜色和打印不变的优雅有效方式?

Bonus: is there a way to print to a text file highlighting words nicely?奖励:有没有办法打印到文本文件中很好地突出显示单词? For now this gives un-nice现在这给了不友好

 [32mdémoniais[0m en la foule ou la couronnement à [32ml’enchance[0m de la piété, 
 pour cette fois de ce qui a simule comme le capital et départ du [32mdépour[0m de la [32msubissement[0m

You can use colorama: pip install colorama This is how you import it into your project: from colorama import Fore,Back And this is how you use it: print(f"{Fore.Green}Hello world.{Fore.RESET}"您可以使用 colorama: pip install colorama这是将它导入项目的方式: from colorama import Fore,Back这是您使用它的方式: print(f"{Fore.Green}Hello world.{Fore.RESET}"

I clean text adding spaces around the punctuation我清理文本在标点符号周围添加空格

def clean_input_text(text):

    w = re.sub(r"([?.!,;¿’])", r" \1 ", text)
    w = re.sub(r'[" "]+', " ", w)
    
    return w

then work on the.split() of the text然后处理文本的 the.split()

# color new words
colored_text=""
for t in generated_text.split():
    if t in diff:
        colored_text+=colored(t, 'green')
    else:
        colored_text+=t
    colored_text+=" "

finally I arrange around the punctuation最后我安排了标点符号

colored_text = colored_text.replace(" . ", ". ")
colored_text = colored_text.replace(" , ", ", ")
colored_text = colored_text.replace(" ! ", "! ")
colored_text = colored_text.replace(" ? ", "? ")
colored_text = colored_text.replace(" ’ ", "’")

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM