简体   繁体   English

在R中添加dataframe中的月数

[英]Add number of months in a dataframe in R

I have a dataframe like this:我有一个这样的 dataframe:

customer= c('1530','1530','1530','1531','1531','1532')  
month =  c('2021-10-01','2021-11-01','2021-12-01','2021-11-01','2021-12-01','2021-12-01')  
month_number = c(1,2,3,1,2,1)  
df <- data.frame('customer_id'=customer, entry_month=month)  
df
| customer_id| entry_month|
| ---------- | ---------- |
1|      1530 | 2021-10-01 |
2|      1530 | 2021-11-01 |
3|      1530 | 2021-12-01 |
4|      1531 | 2021-11-01 |
5|      1531 | 2021-12-01 |
6|      1532 | 2021-12-01 |

I need to create a column that indicates the number of the month since the customer joined.我需要创建一个列来指示客户加入后的月份数。 Here is my desired output:这是我想要的 output:

new_df <- data.frame('customer_id'=customer, 'month'=month, 'month_number'=month_number)  
new_df  
| customer_id| entry_month| month_number |
| ---------- | ---------- |--------------|
1|      1530 | 2021-10-01 | 1            |
2|      1530 | 2021-11-01 | 2            |
3|      1530 | 2021-12-01 | 3            |
4|      1531 | 2021-11-01 | 1            |
5|      1531 | 2021-12-01 | 2            |
6|      1532 | 2021-12-01 | 1            |

You can convert entry_month to date format and then simply use first :您可以将entry_month转换为date格式,然后只需使用first

library(dplyr)
df %>%
  group_by(customer_id) %>%
  mutate(
    entry_month = as.Date(entry_month),
    nmonth = round(as.numeric(entry_month - first(entry_month)) / 30) + 1,
  )

# A tibble: 6 x 3
# Groups:   customer_id [3]
  customer_id entry_month nmonth
  <chr>       <date>       <dbl>
1 1530        2021-10-01       1
2 1530        2021-11-01       2
3 1530        2021-12-01       3
4 1531        2021-11-01       1
5 1531        2021-12-01       2
6 1532        2021-12-01       1

Note that this works if entry_month is always the first day in a month.请注意,如果entry_month始终是一个月中的第一天,则此方法有效。 Otherwise you will have to specify what exactly one month means.否则,您将必须具体说明一个月的确切含义。 Eg if the first entry is in 2021-10-20 and the second one is in 2021-11-10 what would be desired outcome of nmonth ?例如,如果第一个条目在2021-10-20中,第二个条目在 2021-11-10 中, 2021-11-10的期望结果是nmonth

This takes the year-month part of the date and counts distinct values.这需要日期的年月部分并计算不同的值。

I extended the example to include a repeated month.我扩展了示例以包括重复的月份。

library(dplyr)

df %>% 
  group_by(customer_id) %>% 
  arrange(entry_month, .by_group=T) %>% 
  mutate(month_number = cumsum(
           !duplicated(strftime(entry_month, "%Y-%m")))) %>% 
  ungroup()
# A tibble: 7 × 3
  customer_id entry_month month_number
  <chr>       <chr>              <int>
1 1530        2021-10-01             1
2 1530        2021-10-12             1
3 1530        2021-11-01             2
4 1530        2021-12-01             3
5 1531        2021-11-01             1
6 1531        2021-12-01             2
7 1532        2021-12-01             1

Data数据

df <- structure(list(customer_id = c("1530", "1530", "1530", "1530",
"1531", "1531", "1532"), entry_month = c("2021-10-01", "2021-10-12",
"2021-11-01", "2021-12-01", "2021-11-01", "2021-12-01", "2021-12-01"
)), row.names = c(NA, -7L), class = "data.frame")

Optionally you can use data.table package:您可以选择使用data.table package:

library(data.table)

dt <- setDT(df)

dt[, entry_month := as.IDate(entry_month)] # Tranform the column as "IDate"

dt2 <- dt[, seq_along(entry_month), by = customer_id] # Create the sequence

dt[, mont_number := dt2$V1] # Include into the datatable

dt

Output: Output:

 customer_id entry_month mont_number
1:        1530  2021-10-01           1
2:        1530  2021-11-01           2
3:        1530  2021-12-01           3
4:        1531  2021-11-01           1
5:        1531  2021-12-01           2
6:        1532  2021-12-01           1

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM