I'm pretty new to programming and have been pretty enthralled by its power so far. In this vein, there was a problem I had in which have a dataset in which one of the variable is a commodity name: "apple", "pear", "cauliflower", "clog", "sneaker", etc. I want to try and group the commodities into something a little more high-order: "fruits", "vegetables"," "shoes", etc. My sense from doing some searching is that this would be a dictionary-based chunking problem, but I'm not sure how to implement a solution. I could get lists of vegetables, fruits, and types of shoes pretty easily, but are there existing packages that could help with this kind of a problem specifically? I'm most comfortable with Python and R, so anything that can be used with those languages would be most helpful.
Apologies if this question isn't written in a specific-enough way. I'm new to stackoverflow and am still getting the hang of the thing.
Clarification : I'm trying to create a new dataset with these new higher-order labels.
Here's how I would do it:
higher_order_conversion = {
('apple', 'pear', 'kiwi'): 'fruit', #the keys must be tuples, not lists
('X', 'Y', 'Z'): 'letter', # (because tuples are immutable and therefore hashable)
('loafers', 'sneakers', 'high heels'): 'shoes'
}
data_set = [[125, 'apple'], #these numbers are id numbers, or whatever extra information you might have packaged with your data
[126, 'Y'],
[127, 'loafers'],
[103, 'kiwi']
]
print 'before', data_set
for data in data_set:
for lower_order_list in higher_order_conversion.keys():
if data[1] in lower_order_list:
data[1] = higher_order_conversion[lower_order_list]
print 'after', data_set
Output:
before [[125, 'apple'], [126, 'Y'], [127, 'loafers'], [103, 'kiwi']]
after [[125, 'fruit'], [126, 'letter'], [127, 'shoes'], [103, 'fruit']]
Hopefully this gives you some ideas.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.