[英]spliting month/year strings in r by `-`
I have a column which is the following; 我有一列如下:
fiscal_year_end
1 1231
2 1231
3 1231
4 1231
5 202
6 1231
7 1231
8 202
9 1231
10 927
They correspond to months, ie 12-31
, 9-27
and 20-2
. 它们对应于个月,即
12-31
, 9-27
和20-2
。
I am trying to put them in that format but cannot seem to get it right. 我试图将它们设置为这种格式,但似乎无法正确处理。
I have tried str_replace_all(df$fiscal_year_end, "(?<=^\\\\d{2}|^\\\\d{4})", "-")
using the stringr
package but it is not coming out as I expect. 我已经尝试过使用
stringr
包对str_replace_all(df$fiscal_year_end, "(?<=^\\\\d{2}|^\\\\d{4})", "-")
进行stringr
但它并没有如我stringr
。
Where am I going wrong here? 我在哪里错了?
Data: 数据:
structure(list(fiscal_year_end = c(1231L, 1231L, 1231L, 1231L,
202L, 1231L, 1231L, 202L, 1231L, 927L, 228L, 1231L, 1231L, 1231L,
1231L, 928L, 1231L, 1231L, 930L, 1231L, 1231L, 628L, 1231L, 1231L,
1228L, 930L, 1231L, 1231L, 1231L, 1231L, 927L, 630L, 1231L, 202L,
1231L, 1231L, 1231L, 1231L, 927L, 930L, 1231L, 1231L, 1231L,
1231L, 228L, 928L, 1231L, 1231L, 1231L, 1231L, 1231L, 1231L,
1231L, 1231L, 1231L, 1231L, 1228L, 1231L, 1231L, 1231L, 1231L,
131L, 1231L, 1231L, 1231L, 1231L, 1231L, 1231L, 930L, 1231L,
1231L, 1231L, 1231L, 1231L, 1231L, 1231L, 831L, 1231L, 102L,
1231L, 1231L, 1231L, 1130L, 1231L, 1228L, 1231L, 1231L, 1231L,
1231L, 1231L, 1231L, 1231L, 1231L, 1231L, 930L, 1031L, 1231L,
1231L, 1231L, 1231L, 1231L, 1231L, 203L, 1231L, 1231L, 1231L,
1231L, 1231L, 1229L, 1231L, 1231L, 1231L, 426L, 1231L, 1231L,
1231L, 1231L, 1231L, 1231L, 1231L, 1231L, 1231L, 202L, 1231L,
1231L, 1231L, 1231L, 1231L, 1231L, 1229L, 1231L, 1231L, 630L,
1231L, 1231L, 1209L, 1231L, 1231L, 1231L, 728L, 1231L, 1231L,
1231L, 1231L, 1231L, 1231L, 630L, 1231L, 1231L, 1231L, 1231L,
1231L, 1231L, 727L, 1231L, 201L, 1231L, 1231L, 1231L, 1231L,
1231L, 630L, 1231L, 1231L, 1231L, 1130L, 1231L, 1231L, 1231L,
1231L, 1231L, 1231L, 1231L, 930L, 930L, 1231L, 1231L, 331L, 1231L,
1231L, 1231L, 1231L, 1231L, 1231L, 1231L, 1031L, 1229L, 1231L,
1231L, 1231L, 201L, 1231L, 1231L, 1231L, 1231L, 1231L, 1231L,
831L, 630L, 831L)), row.names = c(NA, -200L), .internal.selfref = <pointer: 0x0000000002511ef0>, class = "data.frame")
EDIT: 编辑:
datadate fiscal_year_end
1 2012-08-31 831
2 2017-01-31 201
3 1999-12-31 1231
4 2009-02-28 228
5 2010-12-31 1231
6 2005-12-31 1231
7 <NA> 630
8 2010-09-30 928
9 2009-09-30 930
10 2018-01-31 201
11 2017-12-31 1231
12 2004-12-31 1231
We can separate
after formating to 4-digits 格式化为4位数字后我们可以
separate
library(dplyr)
library(tidyr)
df1 %>%
mutate(fiscal_year_end = sprintf("%04d", fiscal_year_end)) %>%
separate(fiscal_year_end, c("month", "day"), sep= 2)
Or use negative index in separate
或
separate
使用负索引
df1 %>%
separate(fiscal_year_end, c("month", "day"), sep= -2)
Or using only base R
, we use sub
to create a delimiter (using only single capture group) and convert it to a two column data.frame with read.csv
或仅使用
base R
,我们使用sub
创建一个定界符(仅使用单个捕获组)并将其转换为两列data.frame(具有read.csv
out <- read.csv(text = sub("(\\d{2})$", ",\\1", df1[[1]]), header = FALSE,
col.names = c("month", "day"), stringsAsFactors = FALSE)
head(out, 5)
# month day
#1 12 31
#2 12 31
#3 12 31
#4 12 31
#5 2 2
Using base R, we can use sub
with two capturing groups, where second part is a number with two digits whereas first part is everything else. 使用基数R,我们可以将
sub
与两个捕获组一起使用,其中第二部分是具有两位数字的数字,而第一部分是其他所有内容。
sub("(.*)(\\d+{2}$)", "\\1-\\2", df$fiscal_year_end)
#[1] "12-31" "12-31" "12-31" "12-31" "2-02" "12-31" "12-31" "2-02" "12-31"
# "9-27" "2-28" "12-31" .....
Another admittedly overly complex way: 另一种公认的过于复杂的方式:
res1<-ifelse(nchar(my_df$fiscal_year_end)%%2==0,substring(my_df$fiscal_year_end,1,2),
substring(my_df$fiscal_year_end,1,1))
res2<-ifelse(nchar(my_df$fiscal_year_end)%%2==0,substring(my_df$fiscal_year_end,3,4),
substring(my_df$fiscal_year_end,2,3))
paste0(res1,"-",res2)
Result: 结果:
[1] "12-31" "12-31" "12-31" "12-31" "2-02" "12-31" "12-31" "2-02" "12-31" "9-27"
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.