![](/img/trans.png)
[英]How to count the number of days, weeks, months in a date column of a dataframe in r?
[英]Add number of months in a dataframe in R
我有一個這樣的 dataframe:
customer= c('1530','1530','1530','1531','1531','1532')
month = c('2021-10-01','2021-11-01','2021-12-01','2021-11-01','2021-12-01','2021-12-01')
month_number = c(1,2,3,1,2,1)
df <- data.frame('customer_id'=customer, entry_month=month)
df
| customer_id| entry_month|
| ---------- | ---------- |
1| 1530 | 2021-10-01 |
2| 1530 | 2021-11-01 |
3| 1530 | 2021-12-01 |
4| 1531 | 2021-11-01 |
5| 1531 | 2021-12-01 |
6| 1532 | 2021-12-01 |
我需要創建一個列來指示客戶加入后的月份數。 這是我想要的 output:
new_df <- data.frame('customer_id'=customer, 'month'=month, 'month_number'=month_number)
new_df
| customer_id| entry_month| month_number |
| ---------- | ---------- |--------------|
1| 1530 | 2021-10-01 | 1 |
2| 1530 | 2021-11-01 | 2 |
3| 1530 | 2021-12-01 | 3 |
4| 1531 | 2021-11-01 | 1 |
5| 1531 | 2021-12-01 | 2 |
6| 1532 | 2021-12-01 | 1 |
您可以將entry_month
轉換為date
格式,然后只需使用first
:
library(dplyr)
df %>%
group_by(customer_id) %>%
mutate(
entry_month = as.Date(entry_month),
nmonth = round(as.numeric(entry_month - first(entry_month)) / 30) + 1,
)
# A tibble: 6 x 3
# Groups: customer_id [3]
customer_id entry_month nmonth
<chr> <date> <dbl>
1 1530 2021-10-01 1
2 1530 2021-11-01 2
3 1530 2021-12-01 3
4 1531 2021-11-01 1
5 1531 2021-12-01 2
6 1532 2021-12-01 1
請注意,如果entry_month
始終是一個月中的第一天,則此方法有效。 否則,您將必須具體說明一個月的確切含義。 例如,如果第一個條目在2021-10-20
中,第二個條目在 2021-11-10 中, 2021-11-10
的期望結果是nmonth
?
這需要日期的年月部分並計算不同的值。
我擴展了示例以包括重復的月份。
library(dplyr)
df %>%
group_by(customer_id) %>%
arrange(entry_month, .by_group=T) %>%
mutate(month_number = cumsum(
!duplicated(strftime(entry_month, "%Y-%m")))) %>%
ungroup()
# A tibble: 7 × 3
customer_id entry_month month_number
<chr> <chr> <int>
1 1530 2021-10-01 1
2 1530 2021-10-12 1
3 1530 2021-11-01 2
4 1530 2021-12-01 3
5 1531 2021-11-01 1
6 1531 2021-12-01 2
7 1532 2021-12-01 1
df <- structure(list(customer_id = c("1530", "1530", "1530", "1530",
"1531", "1531", "1532"), entry_month = c("2021-10-01", "2021-10-12",
"2021-11-01", "2021-12-01", "2021-11-01", "2021-12-01", "2021-12-01"
)), row.names = c(NA, -7L), class = "data.frame")
您可以選擇使用data.table
package:
library(data.table)
dt <- setDT(df)
dt[, entry_month := as.IDate(entry_month)] # Tranform the column as "IDate"
dt2 <- dt[, seq_along(entry_month), by = customer_id] # Create the sequence
dt[, mont_number := dt2$V1] # Include into the datatable
dt
Output:
customer_id entry_month mont_number
1: 1530 2021-10-01 1
2: 1530 2021-11-01 2
3: 1530 2021-12-01 3
4: 1531 2021-11-01 1
5: 1531 2021-12-01 2
6: 1532 2021-12-01 1
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.