[英]POS tagger in python without NLTK
I am trying to make a POS tagger for determiners and prepositions of Sorani Kurdish. 我正在尝试为Sorani Kurdish的确定者和介词制作POS标记器。 I am using the following code to put every tag after each proposition or determiner in my Kurdish text. 我正在使用以下代码将每个标记放在我的库尔德文本中每个命题或确定词之后。
import os
SOR = open("SOR-1.txt", "r+", encoding = 'utf-8')
old_text = SOR.read()
punkt = [".", "!", ",", ":", ";"]
text = ""
for i in old_text:
if i in punkt:
text+=" "+i
else:
text += i
d = {"DET":["ئێمە" , "ئێوە" , "ئەم" , "ئەو" , "ئەوان" , "ئەوەی", "چەند" ], "PREP":["بۆ","بێ","بێجگە","بە","بەبێ","بەدەم","بەردەم","بەرلە","بەرەوی","بەرەوە","بەلای","بەپێی","تۆ","تێ","جگە","دوای","دەگەڵ","سەر","لێ","لە","لەبابەت","لەباتی","لەبارەی","لەبرێتی","لەبن","لەبەینی","لەبەر","لەدەم","لەرێ","لەرێگا","لەرەوی","لەسەر","لەلایەن","لەناو","لەنێو","لەو","لەپێناوی","لەژێر","لەگەڵ","ناو","نێوان","وەک","وەک","پاش","پێش","" ], "punkt":[".", ",", "!"]}
text = text.split()
for w in text:
for pos in d:
if w in d[pos]:
SOR.write(w+"/"+pos+" ")
SOR.close()
What I want to do is to add POS tags inside the text after each of the words in the defined dictionary, but the result is a separate list of words and POS tags at the end of the file. 我想做的是在定义的字典中每个单词之后的文本内添加POS标签,但是结果是在文件末尾单独列出了单词和POS标签。
keep in mind that old_text
is a single string. 请记住, old_text
是单个字符串。 So when you loop through it as in 所以当你像这样循环遍历它时
for i in old_text:
if i in punkt:
you are looping through characters. 您正在遍历字符。 I think you intend to loop through lines of old_text
instead. 我认为您打算改为循环浏览old_text
行。 If that is the case, you could open the file using a with statement specifying read
and write
modes. 如果是这种情况,则可以使用with语句指定read
和write
模式来打开文件。 Something like: 就像是:
with open("SOR-1.txt", 'r+', encoding = 'utf-8') as f:
old_text = f.readlines()
for line in old_text:
for punctuationMark in punct:
if punctuationMark in line.strip('\n'): #when you read the file, every line will be terminated with newline character `'\n'`
#give more instructions
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.