简体   繁体   中英

Searching exact match of a list of strings inside a list of lists in Python

I have a list of lists:

result = [['GELATIN', '76.0 mg', '40 %', 'Gelatin to 100.000 g Table 7 Capsule Quantity per unit flavouring dose Quantity per unit dose Components Nominal mass of capsule 76.0 mg In the cap (40 %) 30.4 mg flavouring agent corresponds to 1 '], 
          ['GELATIN', '45.6 mg', '14.5 %', 'Gelatin including water of a certain percentage'], 
          ['INK', '76.0 mg', '40 %', 'ink is used as diluent far as this is necessary for the markets. Table 4 Atenolol granules Components mg/capsule Granules Active ingredients Atenolol 50.00]]

and a list of strings:

agent = ['Flavouring Agent', 'Anti-Tacking Agent', 'Preservative', 'Colouring Agent', 'Ph Adjusting Agent', 'Plasticizer', 'Diluent']

For each sub-list from result , I want to search for an element from the agent list to be anywhere in the sub-list. If such element exists, add it at the beginning of the sub-list as a new element.

Expected output:

new_result = [['Flavouring Agent', 'GELATIN', '76.0 mg', '40 %', 'Gelatin to 100.000 g Table 7 Capsule Quantity per unit flavouring dose Quantity per unit dose Components Nominal mass of capsule 76.0 mg In the cap (40 %) 30.4 mg flavouring agent corresponds to 1 '], 
              ['GELATIN', '45.6 mg', '14.5 %', 'Gelatin including water of a certain percentage'], 
              ['Diluent', 'INK', '76.0 mg', '40 %', 'ink is used as diluent far as this is necessary for the markets. Table 4 Atenolol granules Components mg/capsule Granules Active ingredients Atenolol 50.00]]

This is because 'Flavouring Agent' is present in the last element of the first sub-list; And 'Diluent' is present in the last element of the last sub-list.

Effort untill now:

newl=[]                
for jj in agent:        
    for e in result:
        for ll in e:

            if jj in ll:
                #print(jj,ll)
                newl.append([jj,ll])
                break

Your problem I believe is confusion with the neting levels, and also the order of the loops. Assuming you want to preserve the order of the original list (and not omit elements), your outer loop should be on the list. Then you want to check for any word from agent present in the list. We can use a "flag" variable to add only one "agent":

res = []
for sub in result:
    new_sub = sub
    agent_found = False
    for ag in agent:
        if agent_found:
            break
        for item in sub:
            if ag.lower() in item.lower():
                new_sub = [ag] + new_sub
                agent_found = True
                break
    if not agent_found:
        new_sub = [" "] + new_sub
    res.append(new_sub)

Gives:

[['Flavouring Agent', 'GELATIN', '76.0 mg', '40 %', 'Gelatin to 100.000 g Table 7 Capsule Quantity per unit flavouring dose Quantity per unit dose Components Nominal mass of capsule 76.0 mg In the cap (40 %) 30.4 mg flavouring agent corresponds to 1 '], 
 ['GELATIN', '45.6 mg', '14.5 %', 'Gelatin including water of a certain percentage'], 
 ['Diluent', 'INK', '76.0 mg', '40 %', 'ink is used as diluent far as this is necessary for the markets. Table 4 Atenolol granules Components mg/capsule Granules Active ingredients Atenolol 50.00']]
new_result=[]

for l in result:
    temp_results=[]
    for ag in agent:
        if ag in l:
            temp_results.append(ag)
    new_result.append(temp_result)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM