简体   繁体   English

Python建模树

[英]Python modeling Tree

I have rethought my entire problem. 我已经重新考虑了整个问题。 I have simplified it down, and now I understand how I can explain it. 我已经简化了下来,现在我明白了如何解释它。

I need to express search terms along with a metric in a tree. 我需要在树中表达搜索条件以及指标。

As an example here is some input 例如,这是一些输入

phrases = (
    ("removals lisbon", 1),
    ("moving to india", 3),
    ("moving to indonesia", 1),
    ("removals dublin", 3),
    ("moving to malta", 45),
    ("move to brazil", 2),
    ("moving chicago", 1),
    ("moving to california", 29),
    ("moving to brussels", 4),
    ("moving to bangladesh", 1),
    ("removals california", 2),
    ("moving from spain", 4),
    ("moving to russia", 3),
    ("move to los angeles", 2),
    ("move to germany", 1),
    ("moving to poalnd", 1),
    ("removals stockholm", 1),
    ("removal to poland", 1),
    ("moves uk", 7),
    ("moving hamburger", 1),
    ("move to malta", 8),
    ("move to london", 1),
    ("moving from cyprus", 1),
    ("move to japan", 5)
)

Starting with the most common word, the most common word is to , taking this word, we can build a tree of all the phrases containing to , we do this by finding all the phrases that contain to , then (ignoring to itself) we simply add a child which is all the words that make up those phrases and their associated score. 最常见的单词开始,最常用的词是to ,取这个词,我们可以建立一个包含所有短语的树to ,我们通过查找所有包含的短语做到这一点to ,然后(忽略to自身),我们简单地添加一个孩子,这是构成这些短语及其相关分数的所有单词。 We then pick the most popular word of the children and repeat then we end up with another set of children not containing to or from , we then keep going down until we run out of depth , we then climb back up and go down another branch. 然后,我们选择最流行的子代词,然后重复,最后得到另一组不包含tofrom的子代,然后继续下降直到depth不够,然后再回升并下降到另一个分支。

This will give a structure looking something like 这将使结构看起来像

to
├── move
│   └── city
├── moves
│   └── city
├── moving
│   └── city
└── removals
    └── city

And so on. 等等。

Once I have this tree I can display it and its all fine. 一旦有了这棵树,我就可以显示它及其全部。

I started work on this with the following modified code 我使用以下修改后的代码开始进行此工作

def count(phrases, ignore=()):
    counter = Counter()
    for phrase, _ in phrases:
        for word in phrase.split(" "):
            if word not in ignore:
                counter[word] += 1
    return counter


def filter_word(word, phrases):
    for phrase, count in phrases:
        if word in phrase.split(" "):
            yield phrase, count


class Node(object):
    def __init__(self, word, clicks):
        self.word = word
        self.clicks = clicks

        self.children = []
        self.unprocessed_children = []


def build_tree(pair, phrases, depth, lower_score):
    word, clicks = pair
    root = current = Node(*pair)
    visited = [word]

    phrases = filter_word(word, phrases)
    for phrase, clicks in phrases:
        for word in phrase.split(" "):
            if word in visited:
                continue
            visited.append(word)
            root.unprocessed_children.append(Node(word, clicks))


def identify_root(phrases, depth, lower_score, ignore=()):
    words = count(phrases, ignore=ignore).most_common()
    print words[0:10]
    trees = []
    for root in words:
        trees.append(build_tree(root, phrases, depth, lower_score))
        return
    return trees

But I am lost in build_tree as to actually go down and create more children upto depth. 但是我对build_tree迷失了,因为我实际上要下降并创建更多的子级直到深度。

If I understood your meaning, this recursive code should do it. 如果我理解您的意思,那么此递归代码应该可以做到。

def build_node(word, clicks, phrases, depth, ignore):        
    node = Node(word, clicks)
    node.children = build_children(list(filter_word(word, phrases)), 
        depth-1, 
        ignore + (word,))
    return node

def build_children(phrases, depth, ignore):
    if depth > 0:
        words = count(phrases, ignore=ignore).most_common()
        return [build_node(word, clicks, phrases, depth, ignore) 
                for word, clicks in words]
    else:
        return []

def identify_root(phrases, depth, ignore=()):
    return build_children(phrases, depth, ignore)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM