簡體   English   中英

如何修復此代碼並制作我自己的 POS 標記器? (PYTHON)

[英]How to fix this code and make my own POS-tagger? (PYTHON)

我的程序需要讀取帶有句子的文件並生成 output ,如下所示:

輸入:Ixé Maria。 output:Ixé\PRON Maria\N-PR。

直到現在,我寫了這個,但是輸出文件給了我一個空的文本文件。 (請給我建議):

infile = open('corpus_test.txt', 'r', encoding='utf-8').read()
outfile = open('tag_test.txt', 'w', encoding='utf-8')

dicionario = {'mimbira': 'N',
             'anama-itá': 'N-PL',
             'Maria': 'N-PR',
             'sumuara-kunhã': 'N-FEM',
             'sumuara-kunhã-itá': 'N-FEM-PL',
             'sapukaia-apigaua': 'N-MASC',
             'sapukaia-apigaua-itá': 'N-MASC-PL',
             'nhaã': 'DEM',
             'nhaã-itá': 'DEM-PL',
             'ne': 'POS',
             'mukuĩ': 'NUM',
             'muíri': 'QUANT',
             'iepé': 'INDF',
             'pirasua': 'A1',
             'pusé': 'A2',
             'ixé': 'PRON1',
             'se': 'PRON2',
             '. ;': 'PUNCT'
             }

np_words = dicionario.keys()
np_tags = dicionario.values()

for line in infile.splitlines():
   list_of_words = line.split()
   if np_words in list_of_words:
       tag_word = list_of_words.index(np_words)+1
       word_tagged = list_of_words.insert(tag_word, f'\{np_tags}') 
       word_tagged = " ".join(word_tagged)
       print(word_tagged, file=outfile)

outfile.close()

簡單地從 NLP 開始,更容易理解和欣賞更先進的系統。

這給了你正在尋找的東西:

# Use 'with' so that the file is automatically closed when the 'with' ends.
with open('corpus_test.txt', 'r', encoding='utf-8') as f:
    # splitlines is not a method, readlines is.
    # infile will contain a list, where each item is a line.
    # e.g. infile[0] = line 1.
    infile = f.readlines()

dicionario = {
    'Maria': 'N-PR',
    'ixé': 'PRON1',
}

# Make a list to hold the new lines
outlines = []

for line in infile:
    list_of_words = line.split()
    
    new_line = ''
    # 'if np_words in list_of_words' is asking too much of Python.
    for word in list_of_words:
        # todo: Dictionaries are case-sensitive, so ixé is different to Ixé.
        if word in dicionario:
            new_line += word + '\\' + dicionario[word] + ' '
        else:
            new_line += word + ' '

    # Append the completed new line to the list and add a carriage return.
    outlines.append(new_line.strip() + '\n')

with open('tag_test.txt', 'w', encoding='utf-8') as f:
    f.writelines(outlines)

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM