简体   繁体   中英

How to apply .sort() with a key=lambda function to every row of a dataframe on a single column?

I have a dataframe with a column containing a list of dates:

data = [
    [
        1,
        [
            "2017-12-06",
            "2017-12-05",
            "2017-12-06",
            "2018-01-03",
            "2018-01-04",
            "2017-11-24",
        ],
    ],
    [
        2,
        [
            "2019-03-10",
            "2018-12-03",
            "2018-12-04",
            "2018-11-08",
            "2018-11-30",
            "2019-03-22",
            "2018-11-24",
            "2019-03-06",
            "2017-11-16",
        ],
    ],
]
df = pd.DataFrame(data, columns=["id", "dates"])
df

id  dates
1   [2017-12-06, 2017-12-05, 2017-12-06, 2018-01-03, 2018-01-04, 2017-11-24]
2   [2019-03-10, 2018-12-03, 2018-12-04, 2018-11-08, 2018-11-30, 2019-03-22, 2018-11-24, 2019-03-06, 2017-11-16]
print(df.dtypes)
id        int64
dates    object
dtype: object

I would like to sort the date containing column ( dates ). I have tried a number of methods with no success (including .apply(list.sort) in place ). The only method that I've found that works is using .sort(key = ....) like below:

import datetime
from datetime import datetime

dates = [
    "2019-03-10",
    "2018-12-03",
    "2018-12-04",
    "2018-11-08",
    "2018-11-30",
    "2019-03-22",
    "2018-11-24",
    "2019-03-06",
    "2017-11-16",
]

dates.sort(key=lambda date: datetime.strptime(date, "%Y-%m-%d"))

but I can only get it to work on a list and I want to apply this to that entire column in the dataframe df . Can anyone advise the best way to do this? Or perhaps there is an even better way to sort this column?

What I see here is that you want the list in every row to be sorted (not the column itself).

The code below applies a certain function (something like my_sort() ) to each row of column "dates":

df['dates'].apply(my_sort)

You just need to implement my_sort to be applied to the list in each row. Something like:

def my_sort(dates):
    dates.sort(key=lambda date: datetime.strptime(date, "%Y-%m-%d"))
    return dates

list.sort() sorts the list and returns None so you need to return the list itself after calling sort .

Edit :

According to the comment from @jch , it's a better practice to copy the list first and then call sort method. This way, any unexpected behavior or error produced by sort method (if any happens) won't affect the original list (in your datafram). To achieve that, you can change my_sort to something like:

from copy import deepcopy

def my_sort(dates):
    dates_copy = deepcopy(dates)
    dates_copy.sort(key=lambda date: datetime.strptime(date, "%Y-%m-%d"))
    return dates_copy

You can learn more about copy and deepcopy of objects here .

You can use .apply() to apply a given function (in this case 'sort') to every row of a dataframe column.

This should work:

df['dates'].apply(lambda row: row.sort(key=lambda date: datetime.strptime(date, "%Y-%m-%d")))

print(df)

returns:

   id                                              dates
0   1  ['2017-11-24', '2017-12-05', '2017-12-06', '2017-12-06', '2018-01-03', '2018-01-04']
1   2  ['2017-11-16', '2018-11-08', '2018-11-24', '2018-11-30', '2018-12-03', '2018-12-04', '2019-03-06', '2019-03-10', '2019-03-22']

Note that in this case the code df['data'] = df['data'].apply(...) will NOT work because the sort function has a default inplace=True parameter: it directly modifies the dataframe and doesn't create a new one. To apply other functions you might have to use the df = df.apply(etc) formulation.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM