基于字典的关键字分类

Question

I'm pretty new to programming and have been pretty enthralled by its power so far.我对编程还很陌生，到目前为止，我对它的强大功能非常着迷。 In this vein, there was a problem I had in which have a dataset in which one of the variable is a commodity name: "apple", "pear", "cauliflower", "clog", "sneaker", etc. I want to try and group the commodities into something a little more high-order: "fruits", "vegetables"," "shoes", etc. My sense from doing some searching is that this would be a dictionary-based chunking problem, but I'm not sure how to implement a solution. I could get lists of vegetables, fruits, and types of shoes pretty easily, but are there existing packages that could help with this kind of a problem specifically? I'm most comfortable with Python and R, so anything that can be used with those languages would be most helpful.在这方面，我遇到了一个问题，其中有一个数据集，其中一个变量是商品名称：“苹果”、“梨”、“花椰菜”、“木屐”、“运动鞋”等。我想要尝试将商品组合成更高级的东西：“水果”、“蔬菜”、“鞋子”等。我从做一些搜索的感觉是，这将是一个基于字典的分块问题，但我“我不确定如何实施解决方案。我可以很容易地获得蔬菜、水果和鞋子类型的列表，但是是否有现有的软件包可以专门帮助解决此类问题？我对 Python 和R，因此任何可以与这些语言一起使用的东西都会最有帮助。

Apologies if this question isn't written in a specific-enough way.如果这个问题没有以足够具体的方式写出来，我们深表歉意。 I'm new to stackoverflow and am still getting the hang of the thing.我是 stackoverflow 的新手，并且仍在掌握这件事的窍门。

Clarification : I'm trying to create a new dataset with these new higher-order labels.澄清：我正在尝试使用这些新的高阶标签创建一个新数据集。

Answer 1

Here's how I would do it:这是我将如何做到的：

higher_order_conversion = {
    ('apple', 'pear', 'kiwi'): 'fruit',   #the keys must be tuples, not lists 
    ('X', 'Y', 'Z'): 'letter', # (because tuples are immutable and therefore hashable)
    ('loafers', 'sneakers', 'high heels'): 'shoes'
}

data_set = [[125, 'apple'], #these numbers are id numbers, or whatever extra information you might have packaged with your data
            [126, 'Y'],
            [127, 'loafers'],
            [103, 'kiwi']
            ]

print 'before', data_set

for data in data_set:
    for lower_order_list in higher_order_conversion.keys():
        if data[1] in lower_order_list:
            data[1] = higher_order_conversion[lower_order_list]

print 'after', data_set

Output:输出：

before [[125, 'apple'], [126, 'Y'], [127, 'loafers'], [103, 'kiwi']]
after [[125, 'fruit'], [126, 'letter'], [127, 'shoes'], [103, 'fruit']]

Hopefully this gives you some ideas.希望这能给你一些想法。

基于字典的关键字分类

问题描述

1 个解决方案

解决方案1
3 已采纳 2013-06-01 01:02:45

基于字典的关键字分类

问题描述

1 个解决方案

解决方案1 3 已采纳 2013-06-01 01:02:45

解决方案1
3 已采纳 2013-06-01 01:02:45