简体   繁体   中英

Sort the values of first list using second list with different length in Python

I have a data with a column with some words. i extracted some words by list of words, for example ingredients_list=['water','milk', 'yeast', 'banana', 'sugar', 'ananas']. This is the list with right order of words and each word should be sorted by this order. When i extracted words, i create a Series of extracted words, but some rows in this series contains two word or no words. For example (actual length of a series is 25000):

index ingredients
0 sugar
1 yeast
2
3 ananas milk
4 sugar water
5 milk

what i want is to order those rows which contains two words, such as in index 3 and 4, by the order of ingredients_list. For example:

index ingredients
0 sugar
1 yeast
2
3 milk ananas
4 water sugar
5 milk

First what i did is to replace empty rows with 'unknown". Then i tried some codes:

ingredients_list=['water','milk', 'yeast', 'banana', 'sugar', 'ananas']
path = '|'.join(r"\b{}\b".format(x) for x in ingredients_list)
ing_l = df['ingredients'].str.findall(pat, flags=re.I).str.join(' ')
ing_l= ing_l.replace("","Unknown")

then to sort them accordingly to ingredients_list:

def sort_list(list1, list2):
    zipped_pairs = zip(list2, list1)
    z = [x for _, x in sorted(zipped_pairs)] 
    return z

words = sort_list(ing_l, ingredients_list)

OR

d = {v:i for i, v in enumerate(ing_l)}
r = sorted(ingredients_list, key=lambda v: d[v])

But what i got is a list of length of 6, as ingredients_list length. Then i try:

ing_l= pd.DataFrame(ing_l)
ing_l['sort'] = [word for x in ingredients_list for word in ing_l if word == x]

But i have some error ValueError: Length of values (0) does not match length of index (25000) Do you have any solution to this problem? Thank you a lot

You can apply sorted with a custom dictionary on the split string and join again:

ingredients_list=['water','milk', 'yeast', 'banana', 'sugar', 'ananas']

order = {k:v for v,k in enumerate(ingredients_list)}

df['sorted_ingredients'] = (
  df['ingredients']
  .str.split()
  .apply(lambda x: ' '.join(sorted(x, key=order.get)) if isinstance(x, list) else x)
 )

output:

   index   ingredients sorted_ingredients
0       0        sugar              sugar
1       1        yeast              yeast
2       2          NaN                NaN
3       3  ananas milk        milk ananas
4       4  sugar water        water sugar
5       5         milk               milk

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM