簡體   English   中英

Python3為句子列表中的特定輸出單詞添加顏色

[英]Python3 add colour to specific outputted words from lists in a sentence

我下面的代碼當前正在檢查一個文本文件,以查看它是否可以從我的詞典文件中找到一個句子中的單詞,如果找到了,則搜索此行以查看是否可以從二級列表中找到一個單詞。一行中滿足條件,然后打印此行。

我想做的是將在次要列表中找到的名為CategoryGA的單詞的詞典單詞顏色設置為例如紅色和藍色,我的目的是在打印輸出中輕松識別每個找到的單詞都已經到來從。

import re
import collections
from collections import defaultdict
from collections import Counter
import sys

from Categories.GainingAccess import GA

Chatpath = "########/Chat1.txt"
Chatfile = Chatpath

lpath = 'Lexicons/######.txt'
lfile = lpath
CategoryGA = GA
Hits = []

"""
text_file = open(path, "r")

lines = text_file.read().split()

c = Counter(lines)

for i, j in c.most_common(50):
    print(i, j)

"""


# class LanguageModelling:

def readfile():
    Word_Hit = None
    with open(Chatfile) as file_read:
        content = file_read.readlines()
        for line_num, line in enumerate(content):
            if any(word in line for word in CategoryGA):
                Word_Hit = False
                for word in CategoryGA:
                    if line.find(word) != -1:
                        Word_Hit = True
                        Hits.append(word)
                        Cleanse = re.sub('<.*?>', '', line)

                        print('%s appeared on Line %d : %s' % (word, line_num, Cleanse))

        file_read.close()

    count = Counter(Hits)
    count.keys()
    for key, value in count.items():
        print(key, ':', value)


def readlex():
    with open(lfile) as l_read:
        l_content = l_read.readlines()
        for line in l_content:
            r = re.compile(r'^\d+\s+\d+\.\d+%\s*')
            l_Cleanse = r.sub('', line)
            print(l_Cleanse)

    l_read.close()


def LanguageDetect():
    with open(Chatfile) as c_read, open(lfile) as l_read:
        c_content = c_read.readlines()

        lex_content = l_read.readlines()
        for line in c_content:
            Cleanse = re.sub('<.*?>', '', line)
            if any(lex_word in line for lex_word in lex_content) \
                    and \
                    any(cat_word in line for cat_word in CategoryGA):
                lex_word = '\033[1;31m{}\033[1;m'.format(lex_word)

                cat_word = '\033[1;44m{}\033[1;m'.format(cat_word)
                print(Cleanse)
                # print(cat_word)

    c_read.close()
    l_read.close()

#readfile()
LanguageDetect()
# readlex()

這是我的完整代碼,但是問題出現在“ LanguageDetect”方法中,我目前通過分配lex_word和cat_word變量進行嘗試的方式無效,坦率地說,我對下一步的嘗試感到困惑。

詞匯:

31547   4.7072% i
25109   3.7466% u
20275   3.0253% you
10992   1.6401% me
9490    1.4160% do
7681    1.1461% like
6293    0.9390% want
6225    0.9288% my
5459    0.8145% have
5141    0.7671% your
5103    0.7614% lol
4857    0.7247% can

然后在readlex方法中,我使用:

r = re.compile(r'^\d+\s+\d+\.\d+%\s*')
            l_Cleanse = r.sub('', line)

刪除單詞/字符之前的所有內容,我認為這可能是關於為什么我無法為詞典單詞加上顏色但不確定如何解決此問題的主要問題。

我認為您的問題來自您處理線路數據的方式,但也許我不清楚您的問題。

這應該夠了吧 :

lex_content = ['aaa', 'xxx']
CategoryGA = ['ccc', 'ddd']
line = 'abc aaa bbb ccc'

for lex_word in lex_content:
  for cat_word in CategoryGA:
    if lex_word in line and cat_word in line:
      print(lex_word, cat_word)
      line = line.replace(lex_word, '\033[1;31m' + lex_word + '\033[1;m')
      line = line.replace(cat_word, '\033[1;44m' + cat_word + '\033[1;m')
      print(line) 

給出輸出:

產量

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM