简体   繁体   English

如何将带有 key=lambda 函数的 .sort() 应用于单个列上数据帧的每一行?

[英]How to apply .sort() with a key=lambda function to every row of a dataframe on a single column?

I have a dataframe with a column containing a list of dates:我有一个包含日期列表的列的数据框:

data = [
    [
        1,
        [
            "2017-12-06",
            "2017-12-05",
            "2017-12-06",
            "2018-01-03",
            "2018-01-04",
            "2017-11-24",
        ],
    ],
    [
        2,
        [
            "2019-03-10",
            "2018-12-03",
            "2018-12-04",
            "2018-11-08",
            "2018-11-30",
            "2019-03-22",
            "2018-11-24",
            "2019-03-06",
            "2017-11-16",
        ],
    ],
]
df = pd.DataFrame(data, columns=["id", "dates"])
df

id  dates
1   [2017-12-06, 2017-12-05, 2017-12-06, 2018-01-03, 2018-01-04, 2017-11-24]
2   [2019-03-10, 2018-12-03, 2018-12-04, 2018-11-08, 2018-11-30, 2019-03-22, 2018-11-24, 2019-03-06, 2017-11-16]
print(df.dtypes)
id        int64
dates    object
dtype: object

I would like to sort the date containing column ( dates ).我想对包含列( dates )的日期进行排序。 I have tried a number of methods with no success (including .apply(list.sort) in place ).我尝试了许多方法都没有成功(包括.apply(list.sort) in place )。 The only method that I've found that works is using .sort(key = ....) like below:我发现唯一可行的方法是使用.sort(key = ....) ,如下所示:

import datetime
from datetime import datetime

dates = [
    "2019-03-10",
    "2018-12-03",
    "2018-12-04",
    "2018-11-08",
    "2018-11-30",
    "2019-03-22",
    "2018-11-24",
    "2019-03-06",
    "2017-11-16",
]

dates.sort(key=lambda date: datetime.strptime(date, "%Y-%m-%d"))

but I can only get it to work on a list and I want to apply this to that entire column in the dataframe df .但我只能让它在一个列表上工作,我想将它应用到数据框df中的整个列。 Can anyone advise the best way to do this?任何人都可以建议最好的方法吗? Or perhaps there is an even better way to sort this column?或者也许有更好的方法来排序这个列?

What I see here is that you want the list in every row to be sorted (not the column itself).我在这里看到的是您希望对每一行中的列表进行排序(而不是列本身)。

The code below applies a certain function (something like my_sort() ) to each row of column "dates":下面的代码将某个函数(类似于my_sort() )应用于“日期”列的每一行:

df['dates'].apply(my_sort)

You just need to implement my_sort to be applied to the list in each row.您只需要实现my_sort即可应用于每行中的列表。 Something like:就像是:

def my_sort(dates):
    dates.sort(key=lambda date: datetime.strptime(date, "%Y-%m-%d"))
    return dates

list.sort() sorts the list and returns None so you need to return the list itself after calling sort . list.sort()对列表进行排序并返回None因此您需要在调用sort后返回列表本身。

Edit :编辑

According to the comment from @jch , it's a better practice to copy the list first and then call sort method.根据@jch的评论,最好先复制列表然后调用sort方法。 This way, any unexpected behavior or error produced by sort method (if any happens) won't affect the original list (in your datafram).这样, sort方法产生的任何意外行为或错误(如果发生)都不会影响原始列表(在您的数据框中)。 To achieve that, you can change my_sort to something like:为此,您可以将my_sort更改为:

from copy import deepcopy

def my_sort(dates):
    dates_copy = deepcopy(dates)
    dates_copy.sort(key=lambda date: datetime.strptime(date, "%Y-%m-%d"))
    return dates_copy

You can learn more about copy and deepcopy of objects here .您可以在此处了解有关对象copydeepcopy复制的更多信息。

You can use .apply() to apply a given function (in this case 'sort') to every row of a dataframe column.您可以使用 .apply() 将给定函数(在本例中为“排序”)应用于数据框列的每一行。

This should work:这应该有效:

df['dates'].apply(lambda row: row.sort(key=lambda date: datetime.strptime(date, "%Y-%m-%d")))

print(df)

returns:返回:

   id                                              dates
0   1  ['2017-11-24', '2017-12-05', '2017-12-06', '2017-12-06', '2018-01-03', '2018-01-04']
1   2  ['2017-11-16', '2018-11-08', '2018-11-24', '2018-11-30', '2018-12-03', '2018-12-04', '2019-03-06', '2019-03-10', '2019-03-22']

Note that in this case the code df['data'] = df['data'].apply(...) will NOT work because the sort function has a default inplace=True parameter: it directly modifies the dataframe and doesn't create a new one.请注意,在这种情况下,代码df['data'] = df['data'].apply(...)将不起作用,因为sort函数具有默认的inplace=True参数:它直接修改数据框并且不会t 创建一个新的。 To apply other functions you might have to use the df = df.apply(etc) formulation.要应用其他功能,您可能必须使用df = df.apply(etc)公式。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM