[英]Add number of months in a dataframe in R
I have a dataframe like this:我有一个这样的 dataframe:
customer= c('1530','1530','1530','1531','1531','1532')
month = c('2021-10-01','2021-11-01','2021-12-01','2021-11-01','2021-12-01','2021-12-01')
month_number = c(1,2,3,1,2,1)
df <- data.frame('customer_id'=customer, entry_month=month)
df
| customer_id| entry_month|
| ---------- | ---------- |
1| 1530 | 2021-10-01 |
2| 1530 | 2021-11-01 |
3| 1530 | 2021-12-01 |
4| 1531 | 2021-11-01 |
5| 1531 | 2021-12-01 |
6| 1532 | 2021-12-01 |
I need to create a column that indicates the number of the month since the customer joined.我需要创建一个列来指示客户加入后的月份数。 Here is my desired output:这是我想要的 output:
new_df <- data.frame('customer_id'=customer, 'month'=month, 'month_number'=month_number)
new_df
| customer_id| entry_month| month_number |
| ---------- | ---------- |--------------|
1| 1530 | 2021-10-01 | 1 |
2| 1530 | 2021-11-01 | 2 |
3| 1530 | 2021-12-01 | 3 |
4| 1531 | 2021-11-01 | 1 |
5| 1531 | 2021-12-01 | 2 |
6| 1532 | 2021-12-01 | 1 |
You can convert entry_month
to date
format and then simply use first
:您可以将entry_month
转换为date
格式,然后只需使用first
:
library(dplyr)
df %>%
group_by(customer_id) %>%
mutate(
entry_month = as.Date(entry_month),
nmonth = round(as.numeric(entry_month - first(entry_month)) / 30) + 1,
)
# A tibble: 6 x 3
# Groups: customer_id [3]
customer_id entry_month nmonth
<chr> <date> <dbl>
1 1530 2021-10-01 1
2 1530 2021-11-01 2
3 1530 2021-12-01 3
4 1531 2021-11-01 1
5 1531 2021-12-01 2
6 1532 2021-12-01 1
Note that this works if entry_month
is always the first day in a month.请注意,如果entry_month
始终是一个月中的第一天,则此方法有效。 Otherwise you will have to specify what exactly one month means.否则,您将必须具体说明一个月的确切含义。 Eg if the first entry is in 2021-10-20
and the second one is in 2021-11-10
what would be desired outcome of nmonth
?例如,如果第一个条目在2021-10-20
中,第二个条目在 2021-11-10 中, 2021-11-10
的期望结果是nmonth
?
This takes the year-month part of the date and counts distinct values.这需要日期的年月部分并计算不同的值。
I extended the example to include a repeated month.我扩展了示例以包括重复的月份。
library(dplyr)
df %>%
group_by(customer_id) %>%
arrange(entry_month, .by_group=T) %>%
mutate(month_number = cumsum(
!duplicated(strftime(entry_month, "%Y-%m")))) %>%
ungroup()
# A tibble: 7 × 3
customer_id entry_month month_number
<chr> <chr> <int>
1 1530 2021-10-01 1
2 1530 2021-10-12 1
3 1530 2021-11-01 2
4 1530 2021-12-01 3
5 1531 2021-11-01 1
6 1531 2021-12-01 2
7 1532 2021-12-01 1
df <- structure(list(customer_id = c("1530", "1530", "1530", "1530",
"1531", "1531", "1532"), entry_month = c("2021-10-01", "2021-10-12",
"2021-11-01", "2021-12-01", "2021-11-01", "2021-12-01", "2021-12-01"
)), row.names = c(NA, -7L), class = "data.frame")
Optionally you can use data.table
package:您可以选择使用data.table
package:
library(data.table)
dt <- setDT(df)
dt[, entry_month := as.IDate(entry_month)] # Tranform the column as "IDate"
dt2 <- dt[, seq_along(entry_month), by = customer_id] # Create the sequence
dt[, mont_number := dt2$V1] # Include into the datatable
dt
Output: Output:
customer_id entry_month mont_number
1: 1530 2021-10-01 1
2: 1530 2021-11-01 2
3: 1530 2021-12-01 3
4: 1531 2021-11-01 1
5: 1531 2021-12-01 2
6: 1532 2021-12-01 1
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.