簡體   English   中英

如何根據另一列中的日期值范圍創建排名列?

[英]How to create a ranking column based on date value range in another column?

data = [
    ["Item_1", "2020-06-01"],
    ["Item_1", "2020-06-02"],
    ["Item_1", "2020-05-27"],
    ["Item_2", "2018-04-15"],
    ["Item_2", "2018-04-18"],
    ["Item_2", "2018-04-22"],
    ["Item_2", "2018-04-28"],
]

df = pd.DataFrame(data, columns=["Item_ID", "Dates"])
df

我有一個包含Item IDsDates列的數據集。 我想在新列中分配排序的“排名”,如果下一個日期距上一個日期 >3 天,則排名/順序值會增加,否則保持不變。

因此,所需的輸出如下所示:

    Item_ID    Dates    Date Order
    Item_1  2020-05-27      1
    Item_1  2020-06-01      2
    Item_1  2020-06-02      2 
    Item_2  2018-04-15      1
    Item_2  2018-04-18      1
    Item_2  2018-04-22      2 
    Item_2  2018-04-28      3

我們可以使用groupby apply來計算每組天之間的差異,然后使用cumsum來“計算”有多少差異大於 (`gt) 3 天:

# Convert to datetime (if not already)
df['Dates'] = pd.to_datetime(df['Dates'])
# Sort in correct order
df = df.sort_values(['Item_ID', 'Dates'], ignore_index=True)
# Calculate Ranking per Group
df['Date Order'] = (
    df.groupby('Item_ID')['Dates'].apply(
        lambda s: s.diff().gt(pd.Timedelta(days=3)).cumsum() + 1
    )
)

也可以groupby兩次並使用groupby diffgroupby cumsum

# Convert to datetime (if not already)
df['Dates'] = pd.to_datetime(df['Dates'])
# Sort in correct order
df = df.sort_values(['Item_ID', 'Dates'], ignore_index=True)

# Reuse same Grouper
g = df.groupby('Item_ID') 
# Calculate Difference per group and compare (whole Series)
df['Date Order'] = g['Dates'].diff().gt(pd.Timedelta(days=3))
# Calculate cumsum per group
df['Date Order'] = g['Date Order'].cumsum() + 1

兩者都產生df

  Item_ID      Dates  Date Order
0  Item_1 2020-05-27           1
1  Item_1 2020-06-01           2
2  Item_1 2020-06-02           2
3  Item_2 2018-04-15           1
4  Item_2 2018-04-18           1
5  Item_2 2018-04-22           2
6  Item_2 2018-04-28           3

以下是作為 DataFrame 的每組步驟的細分:

s = pd.Series([pd.Timestamp('2020-05-27 00:00:00'),
               pd.Timestamp('2020-06-01 00:00:00'),
               pd.Timestamp('2020-06-02 00:00:00')],
              name='Dates',
              index=pd.Series({0: 'Item_1', 1: 'Item_1', 2: 'Item_1'},
                              name='Item_ID'))
steps_per_group = pd.DataFrame({
    'diff': s.diff(),
    'gt': s.diff().gt(pd.Timedelta(days=3)),
    'cumsum': s.diff().gt(pd.Timedelta(days=3)).cumsum(),
    'cumsum 1 start': s.diff().gt(pd.Timedelta(days=3)).cumsum() + 1
})
          diff     gt  cumsum  cumsum 1 start
Item_ID                                      
Item_1     NaT  False       0               1
Item_1  5 days   True       1               2
Item_1  1 days  False       1               2

從您的DataFrame

>>> import pandas as pd

>>> data = [
...     ["Item_1", "2020-05-27"],
...     ["Item_1", "2020-06-01"],
...     ["Item_1", "2020-06-02"],
...     ["Item_2", "2018-04-15"],
...     ["Item_2", "2018-04-18"],
...     ["Item_2", "2018-04-22"],
...     ["Item_2", "2018-04-28"],
... ]
>>> df = pd.DataFrame(data, columns=["Item_ID", "Dates"])
>>> df['Dates'] = pd.to_datetime(df['Dates'], format="%Y-%m-%d")
>>> df
    Item_ID     Dates
0   Item_1  2020-05-27
1   Item_1  2020-06-01
2   Item_1  2020-06-02
3   Item_2  2018-04-15
4   Item_2  2018-04-18
5   Item_2  2018-04-22
6   Item_2  2018-04-28

我們可以像這樣獲得按Item_ID分組的日期diff

>>> window_size = 3
>>> df['diff'] = df.groupby('Item_ID')["Dates"].diff().dt.days.gt(window_size)
>>> df
    Item_ID     Dates   diff
0   Item_1  2020-05-27  False
1   Item_1  2020-06-01  True
2   Item_1  2020-06-02  False
3   Item_2  2018-04-15  False
4   Item_2  2018-04-18  False
5   Item_2  2018-04-22  True
6   Item_2  2018-04-28  True

然后,通過Item_ID再次分組並應用cumsum ,我們得到預期的結果:

>>> df['Date Order'] = df.groupby('Item_ID')["diff"].cumsum()+1
>>> df
    Item_ID     Dates   diff    Date Order
0   Item_1  2020-05-27  False   1
1   Item_1  2020-06-01  True    2
2   Item_1  2020-06-02  False   2
3   Item_2  2018-04-15  False   1
4   Item_2  2018-04-18  False   1
5   Item_2  2018-04-22  True    2
6   Item_2  2018-04-28  True    3

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM