繁体   English   中英

如何修复此代码并制作我自己的 POS 标记器? (PYTHON)

[英]How to fix this code and make my own POS-tagger? (PYTHON)

我的程序需要读取带有句子的文件并生成 output ,如下所示:

输入:Ixé Maria。 output:Ixé\PRON Maria\N-PR。

直到现在,我写了这个,但是输出文件给了我一个空的文本文件。 (请给我建议):

infile = open('corpus_test.txt', 'r', encoding='utf-8').read()
outfile = open('tag_test.txt', 'w', encoding='utf-8')

dicionario = {'mimbira': 'N',
             'anama-itá': 'N-PL',
             'Maria': 'N-PR',
             'sumuara-kunhã': 'N-FEM',
             'sumuara-kunhã-itá': 'N-FEM-PL',
             'sapukaia-apigaua': 'N-MASC',
             'sapukaia-apigaua-itá': 'N-MASC-PL',
             'nhaã': 'DEM',
             'nhaã-itá': 'DEM-PL',
             'ne': 'POS',
             'mukuĩ': 'NUM',
             'muíri': 'QUANT',
             'iepé': 'INDF',
             'pirasua': 'A1',
             'pusé': 'A2',
             'ixé': 'PRON1',
             'se': 'PRON2',
             '. ;': 'PUNCT'
             }

np_words = dicionario.keys()
np_tags = dicionario.values()

for line in infile.splitlines():
   list_of_words = line.split()
   if np_words in list_of_words:
       tag_word = list_of_words.index(np_words)+1
       word_tagged = list_of_words.insert(tag_word, f'\{np_tags}') 
       word_tagged = " ".join(word_tagged)
       print(word_tagged, file=outfile)

outfile.close()

简单地从 NLP 开始,更容易理解和欣赏更先进的系统。

这给了你正在寻找的东西:

# Use 'with' so that the file is automatically closed when the 'with' ends.
with open('corpus_test.txt', 'r', encoding='utf-8') as f:
    # splitlines is not a method, readlines is.
    # infile will contain a list, where each item is a line.
    # e.g. infile[0] = line 1.
    infile = f.readlines()

dicionario = {
    'Maria': 'N-PR',
    'ixé': 'PRON1',
}

# Make a list to hold the new lines
outlines = []

for line in infile:
    list_of_words = line.split()
    
    new_line = ''
    # 'if np_words in list_of_words' is asking too much of Python.
    for word in list_of_words:
        # todo: Dictionaries are case-sensitive, so ixé is different to Ixé.
        if word in dicionario:
            new_line += word + '\\' + dicionario[word] + ' '
        else:
            new_line += word + ' '

    # Append the completed new line to the list and add a carriage return.
    outlines.append(new_line.strip() + '\n')

with open('tag_test.txt', 'w', encoding='utf-8') as f:
    f.writelines(outlines)

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM