[英]How to count the number of days, weeks, months in a date column of a dataframe in r?
[英]Add number of months in a dataframe in R
我有一个这样的 dataframe:
customer= c('1530','1530','1530','1531','1531','1532')
month = c('2021-10-01','2021-11-01','2021-12-01','2021-11-01','2021-12-01','2021-12-01')
month_number = c(1,2,3,1,2,1)
df <- data.frame('customer_id'=customer, entry_month=month)
df
| customer_id| entry_month|
| ---------- | ---------- |
1| 1530 | 2021-10-01 |
2| 1530 | 2021-11-01 |
3| 1530 | 2021-12-01 |
4| 1531 | 2021-11-01 |
5| 1531 | 2021-12-01 |
6| 1532 | 2021-12-01 |
我需要创建一个列来指示客户加入后的月份数。 这是我想要的 output:
new_df <- data.frame('customer_id'=customer, 'month'=month, 'month_number'=month_number)
new_df
| customer_id| entry_month| month_number |
| ---------- | ---------- |--------------|
1| 1530 | 2021-10-01 | 1 |
2| 1530 | 2021-11-01 | 2 |
3| 1530 | 2021-12-01 | 3 |
4| 1531 | 2021-11-01 | 1 |
5| 1531 | 2021-12-01 | 2 |
6| 1532 | 2021-12-01 | 1 |
您可以将entry_month
转换为date
格式,然后只需使用first
:
library(dplyr)
df %>%
group_by(customer_id) %>%
mutate(
entry_month = as.Date(entry_month),
nmonth = round(as.numeric(entry_month - first(entry_month)) / 30) + 1,
)
# A tibble: 6 x 3
# Groups: customer_id [3]
customer_id entry_month nmonth
<chr> <date> <dbl>
1 1530 2021-10-01 1
2 1530 2021-11-01 2
3 1530 2021-12-01 3
4 1531 2021-11-01 1
5 1531 2021-12-01 2
6 1532 2021-12-01 1
请注意,如果entry_month
始终是一个月中的第一天,则此方法有效。 否则,您将必须具体说明一个月的确切含义。 例如,如果第一个条目在2021-10-20
中,第二个条目在 2021-11-10 中, 2021-11-10
的期望结果是nmonth
?
这需要日期的年月部分并计算不同的值。
我扩展了示例以包括重复的月份。
library(dplyr)
df %>%
group_by(customer_id) %>%
arrange(entry_month, .by_group=T) %>%
mutate(month_number = cumsum(
!duplicated(strftime(entry_month, "%Y-%m")))) %>%
ungroup()
# A tibble: 7 × 3
customer_id entry_month month_number
<chr> <chr> <int>
1 1530 2021-10-01 1
2 1530 2021-10-12 1
3 1530 2021-11-01 2
4 1530 2021-12-01 3
5 1531 2021-11-01 1
6 1531 2021-12-01 2
7 1532 2021-12-01 1
df <- structure(list(customer_id = c("1530", "1530", "1530", "1530",
"1531", "1531", "1532"), entry_month = c("2021-10-01", "2021-10-12",
"2021-11-01", "2021-12-01", "2021-11-01", "2021-12-01", "2021-12-01"
)), row.names = c(NA, -7L), class = "data.frame")
您可以选择使用data.table
package:
library(data.table)
dt <- setDT(df)
dt[, entry_month := as.IDate(entry_month)] # Tranform the column as "IDate"
dt2 <- dt[, seq_along(entry_month), by = customer_id] # Create the sequence
dt[, mont_number := dt2$V1] # Include into the datatable
dt
Output:
customer_id entry_month mont_number
1: 1530 2021-10-01 1
2: 1530 2021-11-01 2
3: 1530 2021-12-01 3
4: 1531 2021-11-01 1
5: 1531 2021-12-01 2
6: 1532 2021-12-01 1
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.