[英]Binary Search Tree Frequency Counter
I need to read a text file, strip the unnecessary punctuation, lowercase the words and use binary search tree function to make a word binary search tree that consists of the words in the file. 我需要阅读一个文本文件,去除不必要的标点符号,将单词小写,然后使用二进制搜索树功能来制作由文件中的单词组成的单词二进制搜索树。
We are asked to count the frequency of recurring words and asked for a total word count and total unique word count. 我们被要求计算重复单词的频率,并要求总单词数和总唯一单词数。
So far I've got the punctuation resolved, file read done, lowercase done, binary search tree basically done and I just need to figure out how to implement the "frequency" counter in the code. 到目前为止,我已经解决了标点符号,完成了文件读取,完成了小写字母,基本完成了二进制搜索树的工作,我只需要弄清楚如何在代码中实现“频率”计数器即可。
My code is as follows: 我的代码如下:
class BSearchTree :
class _Node :
def __init__(self, word, left = None, right = None) :
self._word = word
self._count = 0
self._left = left
self._right = right
def __init__(self) :
self._root = None
self._wordc = 0
self._each = 0
def isEmpty(self) :
return self._root == None
def search(self, word) :
probe = self._root
while (probe != None) :
if word == probe._word :
return probe
if word < probe._value :
probe = probe._left
else :
probe = probe._right
return None
def insert(self, word) :
if self.isEmpty() :
self._root = self._Node(word)
self._root._freq += 1 <- is this correct?
return
parent = None #to keep track of parent
#we need above information to adjust
#link of parent of new node later
probe = self._root
while (probe != None) :
if word < probe._word : # go to left tree
parent = probe # before we go to child, save parent
probe = probe._left
elif word > probe._word : # go to right tree
parent = probe # before we go to child, save parent
probe = probe._right
if (word < parent._word) : #new value will be new left child
parent._left = self._Node(word)
else : #new value will be new right child
parent._right = self._Node(word)
cause formatting is killing me, this is the latter part of it. 原因格式化杀死了我,这是它的后半部分。
class NotPresent(Exception) :
pass
def main():
t=BST()
file = open("sample.txt")
line = file.readline()
file.close()
#for word in line:
# t.insert(word)
# Line above crashes program because there are too many
# words to add. Lines on bottom tests BST class
t.insert('all')
t.insert('high')
t.insert('fly')
t.insert('can')
t.insert('boars')
#t.insert('all') <- how do i handle duplicates by making
t.inOrder() #extras add to the nodes frequency?
Thank you for helping/trying to help! 感谢您的帮助/尝试提供帮助!
Firstly, it's better to initialize a Node
's _freq
by 1 than doing that in in BST
's insert()
首先,最好用1初始化
Node
的_freq
,而不是在BST
的insert()
进行初始化
(1 more: In python coding convention, white spaces in writing default argument values are not recommended.) (另外1个:在python编码约定中,不建议在写入默认参数值时使用空格。)
def __init__(self, word, left=None, right=None) :
self._word = word
self._freq = 1
self._left = left
self._right = right
and just add the last 3 lines: 并添加最后三行:
probe = self._root
while (probe != None) :
if word < probe._word : # go to left tree
parent = probe # before we go to child, save parent
probe = probe._left
elif word > probe._word : # go to right tree
parent = probe # before we go to child, save parent
probe = probe._right
else:
probe._freq += 1
return
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.