用`-`分割r中的月份/年份字符串

Question

I have a column which is the following; 我有一列如下：

   fiscal_year_end
1             1231
2             1231
3             1231
4             1231
5              202
6             1231
7             1231
8              202
9             1231
10             927

They correspond to months, ie 12-31 , 9-27 and 20-2 . 它们对应于个月，即12-31 ， 9-27和20-2 。

I am trying to put them in that format but cannot seem to get it right. 我试图将它们设置为这种格式，但似乎无法正确处理。

I have tried str_replace_all(df$fiscal_year_end, "(?<=^\\\\d{2}|^\\\\d{4})", "-") using the stringr package but it is not coming out as I expect. 我已经尝试过使用stringr包对str_replace_all(df$fiscal_year_end, "(?<=^\\\\d{2}|^\\\\d{4})", "-")进行stringr但它并没有如我stringr 。

Where am I going wrong here? 我在哪里错了？

Data: 数据：

structure(list(fiscal_year_end = c(1231L, 1231L, 1231L, 1231L, 
202L, 1231L, 1231L, 202L, 1231L, 927L, 228L, 1231L, 1231L, 1231L, 
1231L, 928L, 1231L, 1231L, 930L, 1231L, 1231L, 628L, 1231L, 1231L, 
1228L, 930L, 1231L, 1231L, 1231L, 1231L, 927L, 630L, 1231L, 202L, 
1231L, 1231L, 1231L, 1231L, 927L, 930L, 1231L, 1231L, 1231L, 
1231L, 228L, 928L, 1231L, 1231L, 1231L, 1231L, 1231L, 1231L, 
1231L, 1231L, 1231L, 1231L, 1228L, 1231L, 1231L, 1231L, 1231L, 
131L, 1231L, 1231L, 1231L, 1231L, 1231L, 1231L, 930L, 1231L, 
1231L, 1231L, 1231L, 1231L, 1231L, 1231L, 831L, 1231L, 102L, 
1231L, 1231L, 1231L, 1130L, 1231L, 1228L, 1231L, 1231L, 1231L, 
1231L, 1231L, 1231L, 1231L, 1231L, 1231L, 930L, 1031L, 1231L, 
1231L, 1231L, 1231L, 1231L, 1231L, 203L, 1231L, 1231L, 1231L, 
1231L, 1231L, 1229L, 1231L, 1231L, 1231L, 426L, 1231L, 1231L, 
1231L, 1231L, 1231L, 1231L, 1231L, 1231L, 1231L, 202L, 1231L, 
1231L, 1231L, 1231L, 1231L, 1231L, 1229L, 1231L, 1231L, 630L, 
1231L, 1231L, 1209L, 1231L, 1231L, 1231L, 728L, 1231L, 1231L, 
1231L, 1231L, 1231L, 1231L, 630L, 1231L, 1231L, 1231L, 1231L, 
1231L, 1231L, 727L, 1231L, 201L, 1231L, 1231L, 1231L, 1231L, 
1231L, 630L, 1231L, 1231L, 1231L, 1130L, 1231L, 1231L, 1231L, 
1231L, 1231L, 1231L, 1231L, 930L, 930L, 1231L, 1231L, 331L, 1231L, 
1231L, 1231L, 1231L, 1231L, 1231L, 1231L, 1031L, 1229L, 1231L, 
1231L, 1231L, 201L, 1231L, 1231L, 1231L, 1231L, 1231L, 1231L, 
831L, 630L, 831L)), row.names = c(NA, -200L), .internal.selfref = <pointer: 0x0000000002511ef0>, class = "data.frame")

EDIT: 编辑：

     datadate fiscal_year_end
1  2012-08-31             831
2  2017-01-31             201
3  1999-12-31            1231
4  2009-02-28             228
5  2010-12-31            1231
6  2005-12-31            1231
7        <NA>             630
8  2010-09-30             928
9  2009-09-30             930
10 2018-01-31             201
11 2017-12-31            1231
12 2004-12-31            1231

Answer 1

We can separate after formating to 4-digits 格式化为4位数字后我们可以separate

library(dplyr)
library(tidyr)
df1 %>% 
  mutate(fiscal_year_end =  sprintf("%04d", fiscal_year_end)) %>% 
  separate(fiscal_year_end, c("month", "day"), sep= 2)

Or use negative index in separate 或separate使用负索引

df1 %>% 
  separate(fiscal_year_end, c("month", "day"), sep= -2)

Or using only base R , we use sub to create a delimiter (using only single capture group) and convert it to a two column data.frame with read.csv 或仅使用base R ，我们使用sub创建一个定界符（仅使用单个捕获组）并将其转换为两列data.frame（具有read.csv

out <- read.csv(text = sub("(\\d{2})$", ",\\1", df1[[1]]), header = FALSE,
       col.names = c("month", "day"), stringsAsFactors = FALSE)

head(out, 5)
#  month day
#1    12  31
#2    12  31
#3    12  31
#4    12  31
#5     2   2

Answer 2

Using base R, we can use sub with two capturing groups, where second part is a number with two digits whereas first part is everything else. 使用基数R，我们可以将sub与两个捕获组一起使用，其中第二部分是具有两位数字的数字，而第一部分是其他所有内容。

sub("(.*)(\\d+{2}$)", "\\1-\\2", df$fiscal_year_end)

#[1] "12-31" "12-31" "12-31" "12-31" "2-02"  "12-31" "12-31" "2-02"  "12-31"
#     "9-27"  "2-28"  "12-31" .....

Answer 3

Another admittedly overly complex way: 另一种公认的过于复杂的方式：

res1<-ifelse(nchar(my_df$fiscal_year_end)%%2==0,substring(my_df$fiscal_year_end,1,2),
              substring(my_df$fiscal_year_end,1,1))
res2<-ifelse(nchar(my_df$fiscal_year_end)%%2==0,substring(my_df$fiscal_year_end,3,4),
             substring(my_df$fiscal_year_end,2,3))      
paste0(res1,"-",res2)

Result: 结果：

[1] "12-31" "12-31" "12-31" "12-31" "2-02"  "12-31" "12-31" "2-02"  "12-31" "9-27"

用`-`分割r中的月份/年份字符串

问题描述

3 个解决方案

解决方案1
2 2019-04-05 15:19:35

解决方案2
2 2019-04-05 15:25:34

解决方案3
2 2019-04-05 15:59:50

用`-`分割r中的月份/年份字符串

问题描述

3 个解决方案

解决方案1 2 2019-04-05 15:19:35

解决方案2 2 2019-04-05 15:25:34

解决方案3 2 2019-04-05 15:59:50

解决方案1
2 2019-04-05 15:19:35

解决方案2
2 2019-04-05 15:25:34

解决方案3
2 2019-04-05 15:59:50