I have an hierarchical based excel which looks something like this:
Df
lev1 lev2 lev3 lev4 lev5 description
RD21 Nan Nan Nan Nan Oil
Nan RD32 Nan Nan Nan Oil/Canola
Nan Nan RD33 Nan Nan Oil/Canola/Wheat
Nan Nan RD34 Nan Nan Oil/Canola/Flour
Nan Nan Nan RD55 Nan Oil/Canola/Flour/Thick
ED54 Nan Nan Nan Nan Rice
Nan ED66 Nan Nan Nan Rice/White
Nan Nan ED88 Nan Nan Rice/White/Jasmine
Nan Nan ED89 Nan Nan Rice/White/Basmati
Nan ED68 Nan Nan Nan Rice/Brown
I would like to get the all level codes based on my selection from the column "description". Eg1: if I search for "Brown" in the description: it should give me something like this:
ED54: Rice
ED68: Rice/Brown
Eg2: If I search for "Thick" in the description column: it should give me something like this:
RD21: Oil
RD32: Oil/Canola
RD34: Oil/Canola/Flour
RD55: Oil/Canola/Flour/Thick
The searching for a word is quite easily handled using Df["Descriptions"].str.contains(word) also I can use a regular expression for finding specific pattern if required. But how do we get the codes associated for this word hierarchy.
Create the hierarchical dict data by lev1~5
vv = df.apply(
lambda x: (
x.iloc[len(x.description.split('/'))-1],
x.description.split('/')
), axis=1
).values
vv
looks like:
array([('RD21', ['Oil']), ('RD32', ['Oil', 'Canola']),
('RD33', ['Oil', 'Canola', 'Wheat']),
('RD34', ['Oil', 'Canola', 'Flour']),
('RD55', ['Oil', 'Canola', 'Flour', 'Thick']), ('ED54', ['Rice']),
('ED66', ['Rice', 'White']),
('ED88', ['Rice', 'White', 'Jasmine']),
('ED89', ['Rice', 'White', 'Basmati']),
('ED68', ['Rice', 'Brown'])], dtype=object)
Create hierarchical dictionary by using vv
d = {}
for i in vv:
v = i[0] # RD33
k = i[1] # ['Oil', 'Canola', 'Wheat']
# loop and set last value in key "RD33"
f_d = d
for j in k[:-1]:
f_d = f_d[j]
f_d[k[-1]] = {'_value': v}
d
looks like:
{'Oil': {'_value': 'RD21',
'Canola': {'_value': 'RD32',
'Wheat': {'_value': 'RD33'},
'Flour': {'_value': 'RD34', 'Thick': {'_value': 'RD55'}}}},
'Rice': {'_value': 'ED54',
'White': {'_value': 'ED66',
'Jasmine': {'_value': 'ED88'},
'Basmati': {'_value': 'ED89'}},
'Brown': {'_value': 'ED68'}}}
Then say you search the word by Df["Descriptions"].str.contains(word)
(or regular expression), which returns:
'Oil/Canola/Flour/Thick'
You can get the results like:
desc_split = 'Oil/Canola/Flour/Thick'.split('/')
res = []
for i in range(len(desc_split)):
all_keys = desc_split[:i+1]
d2 = d
for k in all_keys:
d2 = d2[k]
res.append(f"{d2['_value']}: {'/'.join(all_keys)}")
res
looks like:
['RD21: Oil',
'RD32: Oil/Canola',
'RD34: Oil/Canola/Flour',
'RD55: Oil/Canola/Flour/Thick']
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.