簡體   English   中英

每個唯一列值的天數和擴展評級

[英]Count of number of days for each unique column values and extend ratings

輸入數據框(示例)

Date    Location    Value
01-01-2020  Loc1    27.2
02-01-2020  Loc1    41.9
03-01-2020  Loc1    29.8
04-01-2020  Loc1    7.8
05-01-2020  Loc1    44
06-01-2020  Loc1    0.4
07-01-2020  Loc1    0.8
08-01-2020  Loc1    4.1
09-01-2020  Loc1    4
10-01-2020  Loc1    6.2
11-01-2020  Loc1    54.5
12-01-2020  Loc1    24.8
13-01-2020  Loc1    0
.
.
.
.
01-01-2020  Loc2    6
02-01-2020  Loc2    40.2
03-01-2020  Loc2    2.6
04-01-2020  Loc2    10.2
05-01-2020  Loc2    12
06-01-2020  Loc2    3.2
07-01-2020  Loc2    0
08-01-2020  Loc2    2.4
09-01-2020  Loc2    0
10-01-2020  Loc2    1.2
11-01-2020  Loc2    19.2
12-01-2020  Loc2    21.8
13-01-2020  Loc2    13.6
....

我想添加另一列“評級”,使用以下邏輯填充

Rating  Condition
1       Less than 150 days of data

2       150 to 200 days

3       200 to 250 days

4       250 to 300 days

5       All 365 days

假設 Loc1 有 180 天的數據,因此每行的評分為 2,同樣,Loc2 有所有 360 天的數據,因此評分為 5。因此,output 數據看起來像

Date    Location    Value   Rating
01-01-2021  Loc1    27.2    2
02-01-2021  Loc1    41.9    2
03-01-2021  Loc1    29.8    2
04-01-2021  Loc1    7.8     2
05-01-2021  Loc1    44      2
06-01-2021  Loc1    0.4     2
07-01-2021  Loc1    0.8     2
08-01-2021  Loc1    4.1     2
09-01-2021  Loc1    4       2
10-01-2021  Loc1    6.2     2
11-01-2021  Loc1    54.5    2
12-01-2021  Loc1    24.8    2
13-01-2021  Loc1    0       2
.
.
.
.
01-01-2021  Loc2    6       5
02-01-2021  Loc2    40.2    5
03-01-2021  Loc2    2.6     5
04-01-2021  Loc2    10.2    5
05-01-2021  Loc2    12      5
06-01-2021  Loc2    3.2     5
07-01-2021  Loc2    0       5
08-01-2021  Loc2    2.4     5
09-01-2021  Loc2    0       5
10-01-2021  Loc2    1.2     5
11-01-2021  Loc2    19.2    5
12-01-2021  Loc2    21.8    5
13-01-2021  Loc2    13.6    5
.
.

注意:日期列是一個日期時間對象。

我想對整個 dataframe 這樣做,我該如何實現呢?

您可以通過每個組的最大和最小日期時間找到差異,將 timedeltas 轉換為天,然后使用cut進行分箱:

df['Date'] = pd.to_datetime(df['Date'], dayfirst=True)

df['Days'] = df.groupby('Location')['Date'].transform(lambda x: x.max() - x.min()).dt.days

df['Rating'] = pd.cut(df['Days'], bins=[0, 150, 200, 250, 367], labels=False).add(1)

您可以通過將groupbytransform一起使用,然后使用np.where應用條件來做到這一點:

df['Rating']=df.groupby('Location')['Date'].transform('count')
df['Rating']=np.where(df['Rating']<150,1,np.where(df['Rating']<200,2,np.where(df['Rating']<250,3,np.where(df['Rating']<300,4,np.where(df['Rating']==300,5,'')))))

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM