繁体   English   中英

获取字典中的决策树

[英]Get a decision tree in a dictionary

我正在 python 中寻找一种方法来根据所需的结构动态制作字典。

我有以下数据:

{'weather': ['windy', 'calm'], 'season': ['summer', 'winter', 'spring', 'autumn'],  'lateness': ['ontime', 'delayed']} 

我给出了我希望它们的结构:

['weather', 'season', 'lateness']

最后得到这种格式的数据:

{'calm': {'autumn': {'delayed': 0, 'ontime': 0},
          'spring': {'delayed': 0, 'ontime': 0},
          'summer': {'delayed': 0, 'ontime': 0},
          'winter': {'delayed': 0, 'ontime': 0}},
 'windy': {'autumn': {'delayed': 0, 'ontime': 0},
           'spring': {'delayed': 0, 'ontime': 0},
           'summer': {'delayed': 0, 'ontime': 0},
           'winter': {'delayed': 0, 'ontime': 0}}}

这是我为实现这一目标而想到的手动方式:

dtree = {}
for cat1 in category_cases['weather']:
    dtree.setdefault(cat1, {})
    for cat2 in category_cases['season']:
        dtree[cat1].setdefault(cat2, {})
        for cat3 in category_cases['lateness']:
            dtree[cat1][cat2].setdefault(cat3, 0)

你能想出一种方法来改变我写的结构并得到想要的结果吗? 请记住,结构可能每次都不相同。

另外,如果您想到我可以访问结果的除字典之外的另一种方式,它也适用于我。

如果您不避免使用外部包, pandas.DataFrame可能是一个可行的候选者,因为看起来您将使用一个表:

import pandas as pd
df = pd.DataFrame(
       index=pd.MultiIndex.from_product([d['weather'], d['season']]),
       columns=d['lateness'], data=0
     )

结果:

              ontime  delayed
windy summer       0        0
      winter       0        0
      spring       0        0
      autumn       0        0
calm  summer       0        0
      winter       0        0
      spring       0        0
      autumn       0        0

您可以使用索引轻松进行更改:

df.loc[('windy', 'summer'), 'ontime'] = 1
df.loc['calm', 'autumn']['delayed'] = 2

# Result:
              ontime  delayed
windy summer       1        0
      winter       0        0
      spring       0        0
      autumn       0        0
calm  summer       0        0
      winter       0        0
      spring       0        0
      autumn       0        2

如果您始终对列使用最后一个键,则可以动态构建该表,假设您的键处于所需的插入顺序

df = pd.DataFrame(
       index=pd.MultiIndex.from_product(list(d.values())[:-1]), 
       columns=list(d.values())[-1], data=0
     )

由于您对pandas感兴趣,鉴于您的结构,我还建议您仔细阅读MultiIndex 和 Advance Indexing ,只是为了了解如何处理您的数据。 这里有些例子:

# Gets the sum of 'delayed' items in all of 'calm'
# Filters all the 'delayed' data in 'calm'    
df.loc['calm', 'delayed']

# summer    5
# winter    0
# spring    0
# autumn    2
# Name: delayed, dtype: int64

# Apply a sum:
df.loc['calm', 'delayed'].sum()

# 7

# Gets the mean of all 'summer' (notice the `slice(None)` is required to return all of the 'calm' and 'windy' group)
df.loc[(slice(None), 'summer'), :].mean()

# ontime     0.5
# delayed    2.5
# dtype: float64

它确实非常方便且用途广泛,但在您深入了解它之前,您可能肯定想先阅读一下,该框架可能需要一些时间来适应。


否则,如果您仍然喜欢dict ,那也没有错。 这是一个基于给定键生成的递归 function (假设您的键处于所需的插入顺序)

def gen_dict(d, level=0):
    if level >= len(d):
        return 0
    key = tuple(d.keys())[level]
    return {val: gen_dict(d, level+1) for val in d.get(key)}

gen_dict(d)

结果:

{'calm': {'autumn': {'delayed': 0, 'ontime': 0},
          'spring': {'delayed': 0, 'ontime': 0},
          'summer': {'delayed': 0, 'ontime': 0},
          'winter': {'delayed': 0, 'ontime': 0}},
 'windy': {'autumn': {'delayed': 0, 'ontime': 0},
           'spring': {'delayed': 0, 'ontime': 0},
           'summer': {'delayed': 0, 'ontime': 0},
           'winter': {'delayed': 0, 'ontime': 0}}}

我认为这可能对你有用。

def get_output(category, order, i=0):
         output = {}
         for key in order[i:i+1]:
             for value in category[key]:
                 output[value] = get_output(category, order, i+1)
         if output == {}:
            return 0
         return output

您可以使用itertools.product来获取字典值之间的笛卡尔积(假设您想要相同的键顺序)。 然后我们可以迭代除最后一个键之外的每个键,使用setdefault插入/更新字典。 然后我们可以将最里面的键设置为0

from itertools import product
from pprint import pprint

d = {
    "weather": ["windy", "calm"],
    "season": ["summer", "winter", "spring", "autumn"],
    "lateness": ["ontime", "delayed"],
}

result = {}
for comb in product(*d.values()):
    current = result
    for key in comb[:-1]:
        current = current.setdefault(key, {})
    current[comb[-1]] = 0

pprint(result)

Output:

{'calm': {'autumn': {'delayed': 0, 'ontime': 0},
          'spring': {'delayed': 0, 'ontime': 0},
          'summer': {'delayed': 0, 'ontime': 0},
          'winter': {'delayed': 0, 'ontime': 0}},
 'windy': {'autumn': {'delayed': 0, 'ontime': 0},
           'spring': {'delayed': 0, 'ontime': 0},
           'summer': {'delayed': 0, 'ontime': 0},
           'winter': {'delayed': 0, 'ontime': 0}}}

如果我们想要自定义顺序(可能对没有保证密钥顺序的旧 python 版本有用),我们可以将列表传递给product()

order = ['weather', 'season', 'lateness']

result = {}
for comb in product(*map(d.get, order)):
    current = result
    for key in comb[:-1]:
        current = current.setdefault(key, {})
    current[comb[-1]] = 0

这是一个递归解决方案,与 r.ook 在优秀的接受答案中提供的解决方案略有不同:

category_cases = {'weather': ['windy', 'calm'],
                  'season': ['summer', 'winter', 'spring', 'autumn'],
                  'lateness': ['ontime', 'delayed']}
order = ['weather', 'season', 'lateness']

def gen_tree(category_cases, order):
    if len(order) == 0:
        return 0
    return {x:gen_tree(category_cases, order[1:]) for x in category_cases[order[0]]}

假定字典保留键的顺序,因此它应该更向后兼容。

是的,您可以使用以下代码实现此目的:

import copy

structure = ['weather', 'season', 'lateness']
data = {'weather': ['windy', 'calm'], 'season': ['summer', 'winter', 'spring', 'autumn'],
        'lateness': ['ontime', 'delayed'], }

d_tree = dict()
n = len(structure)  # length of the structure list
prev_val = 0  # the innermost value
while n > 0:
    n -= 1
    keys = data.get(structure[n]) or list()  # get the list of values from data
    # Idea here is to start with inner most dict and keep moving outer
    d_tree.clear()
    for key in keys:
        d_tree[key] = copy.copy(prev_val)
    prev_val = copy.copy(d_tree)  # Copy the d_tree to put as value to outer dict
print(d_tree)

希望这可以帮助!!

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM