简体   繁体   中英

How to Sort the Value of List by a Item in Python?

For example, I have dataframe program like:

lst3 = [
        ['it store', ['asus', 'acer', 'hp', 'dell'], [50000, 30000, 20000, 10000]],
        ['mz store', ['acer', 'dell'], [60000, 75000]],
        ['bm shop', ['hp', 'acer', 'asus'], [45000, 15000, 30000]]
       ]

df3 = pd.DataFrame(lst3, columns =['store_name', 'item', 'price'], dtype = float) 
print(df3)

And the result is:

  store_name                    item                         price
0   it store  [asus, acer, hp, dell]  [50000, 30000, 20000, 10000]
1   mz store            [acer, dell]                [60000, 75000]
2    bm shop        [hp, acer, asus]         [45000, 15000, 30000]

The type of column 'item' and 'price' are list.

So, for example I wanna sort the dataframe by the lowest price of item 'acer'. The expected result is:

  store_name                    item                         price
2    bm shop        [hp, acer, asus]         [45000, 15000, 30000]
0   it store  [asus, acer, hp, dell]  [50000, 30000, 20000, 10000]
1   mz store            [acer, dell]                [60000, 75000]

[edit: additional] And, if sort the dataframe by the lowest price of item 'hp', the expected result is:

  store_name                    item                         price
0   it store  [asus, acer, hp, dell]  [50000, 30000, 20000, 10000]
2    bm shop        [hp, acer, asus]         [45000, 15000, 30000]

Could you help me, how about the program script to make the result like above in Python?

One of the solutions is to convert the DataFrame to records using to_records() method.

Sort it using python's builtin sorted() function.

Then convert back it to DataFrame using from_records() .

For your current DataFrame to sort price by minimum in the list, you can do following.

sorted_records = sorted(df3.to_records(), key=lambda x: min(x[3]))
df3 = pd.DataFrame.from_records(sorted_records)

Keep in track of the index of the column you are trying to sort from when converted to records.

pd.DataFrame.to_records()

pd.DataFrame.from_records()

It seems that the DataFrame does not contain an easy way to sort by specific-user-defined keys. so you can just create a translation to list and sort it as you wish like so:

def sort_by_product(df3, product):

    def get_product_price(current_store):
        current_product = product
        return current_store[2][current_store[1].index(current_product)]

    sorted_list = sorted(df3.values.tolist(), key=get_product_price)    
    return pd.DataFrame(sorted_list , columns =['store_name', 'item', 'price'], dtype = float)

usage example:

sort_by_product(df3, "acer")

Which outputs:

  store_name                    item                         price
0    bm shop        [hp, acer, asus]         [45000, 15000, 30000]
1   it store  [asus, acer, hp, dell]  [50000, 30000, 20000, 10000]
2   mz store            [acer, dell]                [60000, 75000]

Hope that helped

This will work only if all the list in column item contains the string acer

import pandas as pd

lst3 = [
        ['it store', ['asus', 'acer', 'hp', 'dell'], [50000, 30000, 20000, 10000]],
        ['mz store', ['acer', 'dell'], [60000, 75000]],
        ['bm shop', ['hp', 'acer', 'asus'], [45000, 15000, 30000]]
       ]

df3 = pd.DataFrame(lst3, columns =['store_name', 'item', 'price']) 

df3['new'] = df3['item'].apply(lambda x: x.index('acer'))

def f(x):
    return(x[2][x[3]])

df3['new']=df3.apply(f,axis=1)

df3.sort_values(by=['new'], inplace=True)

df3.drop(['new'], axis=1, inplace=True)
df3.reset_index(drop=True, inplace=True)

df3

The output is as follows:

    store_name                   item                         price
0      bm shop        [hp, acer, asus]         [45000, 15000, 30000]
1     it store  [asus, acer, hp, dell]  [50000, 30000, 20000, 10000]
2     mz store            [acer, dell]                [60000, 75000]

I hope this does the work!

You could put whatever computer brand you want to replace 'acer'

from more_itertools import roundrobin as rb
lst3 = [
        ['it store', ['asus', 'acer', 'hp', 'dell'], [50000, 30000, 20000, 10000]],
        ['mz store', ['acer', 'dell'], [60000, 75000]],
        ['bm shop', ['hp', 'acer', 'asus'], [45000, 15000, 30000]]
       ]

d2 = {}
for k,v in {e[0] : list(rb(e[1], e[2])) for e in lst3}.items():
    try:
        d2[k]=v[v.index('acer')+1]
    except:
        continue

ord_lst3 = []
for shop in sorted(d2):
    ord_lst3 += list(filter(lambda e: e[0] == shop, lst3))

print(ord_lst3)

# [['bm shop', ['hp', 'acer', 'asus'], [45000, 15000, 30000]], 
# ['it store', ['asus', 'acer', 'hp', 'dell'], [50000, 30000, 20000, 10000]], 
# ['mz store', ['acer', 'dell'], [60000, 75000]]]

Summary:
item and price are related ( item holds acer , the index of acer in item is directly related to its price in the price column). so we need to find a way to pair them.
get the index of acer in item column, get its corresponding price in the price column, sort from smallest to biggest, get the indices, and use that index to reindex the dataframe:

from operator import itemgetter

#use enumerate to get the numbers attached
#we could also zip the index instead
sorter = sorted([(num,price[item.index('acer')]) 
                 for num, (item,price) 
                 in enumerate(zip(df3.item,df3.price))]
                ,key=itemgetter(1))

#extract only the first item from each tuple in the sorter list
new_index = [first for first,last in sorter]

#reindex dataframe to get our sorted form
df3.reindex(new_index)

       store_name         item                     price
2   bm shop     [hp, acer, asus]        [45000, 15000, 30000]
0   it store    [asus, acer, hp, dell]  [50000, 30000, 20000, 10000]
1   mz store    [acer, dell]            [60000, 75000]

IIUC, Series.str.index and DataFrame.lookup

indexes = df3['item'].str.index('acer')

df = pd.DataFrame(df3['price'].tolist())

(df3.assign(acer_value = df.lookup(df.index , indexes))
    .sort_values('acer_value')
    .drop(columns='acer_value'))


  store_name                    item                         price  
2    bm shop        [hp, acer, asus]         [45000, 15000, 30000]  
0   it store  [asus, acer, hp, dell]  [50000, 30000, 20000, 10000]  
1   mz store            [acer, dell]                [60000, 75000] 

Or:

order = (df3.assign(indexes = df3['item'].str.index('acer'))
            .apply(lambda x: x['price'][x['indexes']], axis=1)
            .sort_values().index)
df3.loc[order] 

It seems that the DataFrame does not contain an easy way to sort by specific-user-defined keys. so you can just create a translation to list and sort it as you wish like so:

def sort_by_product(df3, product):

def get_product_price(current_store):
    current_product = product
    return current_store[2][current_store[1].index(current_product)]

sorted_list = sorted(df3.values.tolist(), key=get_product_price)    
return pd.DataFrame(sorted_list , columns =['store_name', 'item', 'price'], dtype = float)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM