[英]splitting a number(int) within a column in DF into 4 new columns R
Below is the output of one of my columns within a DF I've created from importing a weekly summary.csv.以下是我通过导入每周摘要.csv 创建的 DF 中的一列的输出。 these are unique codes and each code should only be 4 number long ie 8400, 9070 etc. when the summary document is produced all the codes are bunched together without commas or indentation.
这些是唯一的代码,每个代码的长度只能是 4 个数字,即 8400、9070 等。当生成摘要文档时,所有代码都聚集在一起,没有逗号或缩进。 like below:
像下面这样:
1 84709070
2 75508470
3 8400
3 750084009100
is there a way I can turn the above into 4 new rows that split the numbers start from the first int by 4 places ie output the fourth row would look like:有没有办法可以将上面的内容转换为 4 个新行,将数字从第一个 int 开始分成 4 个位置,即输出第四行如下所示:
tariff1, tariff2, tariff3, tariff4
7500 8400 9100 none
I managed to create an abomination in excel but it hardly works at the best of time and id prefer to use R for everything, we are getting about 30k of these entries a week would really streamline processes!我设法在 excel 中创建了一个令人憎恶的东西,但它在最好的时候几乎不起作用,我更喜欢使用 R 来处理所有事情,我们每周收到大约 3 万个这样的条目,这将真正简化流程!
You can use tidyr::separate
mentioning the positions where you want to split in sep
.您可以使用
tidyr::separate
提及要在sep
拆分的位置。
tidyr::separate(df, V2, paste0('col', 1:4), sep = seq(4, 12, 4), convert = TRUE)
# V1 col1 col2 col3 col4
#1 1 8470 9070 NA NA
#2 2 7550 8470 NA NA
#3 3 8400 NA NA NA
#4 3 7500 8400 9100 NA
seq
generates the sequence of positions. seq
生成位置序列。
seq(4, 12, 4)
#[1] 4 8 12
data数据
df <- structure(list(V1 = c(1L, 2L, 3L, 3L), V2 = c(84709070, 75508470,
8400, 750084009100)), class = "data.frame", row.names = c(NA, -4L))
Here is a base R option, which defines a function f
to split the numbers这是一个基本的 R 选项,它定义了一个函数
f
来分割数字
f <- function(x) t(`length<-`(as.numeric(sapply(seq(1,nchar(x),by = 4), function(k) substr(x,k,k+3))),4))
dfout <- cbind(df,data.frame(Vectorize(f)(df$V2)))
such that以至于
V1 V2 X1 X2 X3 X4
1 1 84709070 8470 7550 8400 7500
2 2 75508470 9070 8470 NA 8400
3 3 8400 NA NA NA 9100
4 3 750084009100 NA NA NA NA
Data数据
> dput(df)
structure(list(V1 = c(1L, 2L, 3L, 3L), V2 = c(84709070, 75508470,
8400, 750084009100)), class = "data.frame", row.names = c(NA,
-4L))
An option with strsplit
from base R
从
base R
使用strsplit
的选项
lst1 <- strsplit(as.character(df$V2), "(?<=....)", perl = TRUE)
df[paste0('col', 1:4)] <- do.call(rbind, lapply(lst1,
`length<-`, max(lengths(lst1))+1))
df <- type.convert(df, as.is = TRUE)
-output -输出
df
# V1 V2 col1 col2 col3 col4
#1 1 84709070 8470 9070 NA NA
#2 2 75508470 7550 8470 NA NA
# 3 8400 8400 NA NA NA
#4 3 750084009100 7500 8400 9100 NA
Or using read.fwf
from base R
或者使用来自
base R
read.fwf
df[paste0('col', 1:4)] <- read.fwf(file = textConnection(as.character(df$V2)),
widths = c(4, 4, 4, 4))
df <- structure(list(V1 = c(1L, 2L, 3L, 3L), V2 = c(84709070, 75508470,
8400, 750084009100)), class = "data.frame", row.names = c(NA,
-4L))
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.