将 DF 中一列中的数字（int）拆分为 4 个新列 R

Question

Below is the output of one of my columns within a DF I've created from importing a weekly summary.csv.以下是我通过导入每周摘要.csv 创建的 DF 中的一列的输出。 these are unique codes and each code should only be 4 number long ie 8400, 9070 etc. when the summary document is produced all the codes are bunched together without commas or indentation.这些是唯一的代码，每个代码的长度只能是 4 个数字，即 8400、9070 等。当生成摘要文档时，所有代码都聚集在一起，没有逗号或缩进。 like below:像下面这样：

1 84709070
2 75508470
3 8400
3 750084009100

is there a way I can turn the above into 4 new rows that split the numbers start from the first int by 4 places ie output the fourth row would look like:有没有办法可以将上面的内容转换为 4 个新行，将数字从第一个 int 开始分成 4 个位置，即输出第四行如下所示：

tariff1, tariff2, tariff3, tariff4
7500     8400     9100     none

I managed to create an abomination in excel but it hardly works at the best of time and id prefer to use R for everything, we are getting about 30k of these entries a week would really streamline processes!我设法在 excel 中创建了一个令人憎恶的东西，但它在最好的时候几乎不起作用，我更喜欢使用 R 来处理所有事情，我们每周收到大约 3 万个这样的条目，这将真正简化流程！

Answer 1

You can use tidyr::separate mentioning the positions where you want to split in sep .您可以使用tidyr::separate提及要在sep拆分的位置。

tidyr::separate(df, V2, paste0('col', 1:4), sep = seq(4, 12, 4), convert = TRUE)

#  V1 col1 col2 col3 col4
#1  1 8470 9070   NA   NA
#2  2 7550 8470   NA   NA
#3  3 8400   NA   NA   NA
#4  3 7500 8400 9100   NA

seq generates the sequence of positions. seq生成位置序列。

seq(4, 12, 4)
#[1]  4  8 12

data数据

df <- structure(list(V1 = c(1L, 2L, 3L, 3L), V2 = c(84709070, 75508470, 
8400, 750084009100)), class = "data.frame", row.names = c(NA, -4L))

Answer 2

Here is a base R option, which defines a function f to split the numbers这是一个基本的 R 选项，它定义了一个函数f来分割数字

f <- function(x) t(`length<-`(as.numeric(sapply(seq(1,nchar(x),by = 4), function(k) substr(x,k,k+3))),4))
dfout <- cbind(df,data.frame(Vectorize(f)(df$V2)))

such that以至于

  V1           V2   X1   X2   X3   X4
1  1     84709070 8470 7550 8400 7500
2  2     75508470 9070 8470   NA 8400
3  3         8400   NA   NA   NA 9100
4  3 750084009100   NA   NA   NA   NA

Data数据

> dput(df)
structure(list(V1 = c(1L, 2L, 3L, 3L), V2 = c(84709070, 75508470, 
8400, 750084009100)), class = "data.frame", row.names = c(NA,
-4L))

Answer 3

An option with strsplit from base R从base R使用strsplit的选项

lst1 <- strsplit(as.character(df$V2), "(?<=....)", perl = TRUE)
df[paste0('col', 1:4)] <- do.call(rbind, lapply(lst1, 
              `length<-`, max(lengths(lst1))+1))
df <- type.convert(df, as.is = TRUE)

-output -输出

df
#  V1           V2 col1 col2 col3 col4
#1  1     84709070 8470 9070   NA   NA
#2  2     75508470 7550 8470   NA   NA
#  3         8400 8400   NA   NA   NA
#4  3 750084009100 7500 8400 9100   NA

Or using read.fwf from base R或者使用来自base R read.fwf

df[paste0('col', 1:4)] <-  read.fwf(file = textConnection(as.character(df$V2)),
              widths = c(4, 4, 4, 4))

data数据

df <- structure(list(V1 = c(1L, 2L, 3L, 3L), V2 = c(84709070, 75508470, 
8400, 750084009100)), class = "data.frame", row.names = c(NA,
-4L))

将 DF 中一列中的数字（int）拆分为 4 个新列 R

问题描述

3 个解决方案

解决方案1
2 已采纳 2020-10-07 05:14:02

解决方案2
0 2020-10-07 08:07:39

解决方案3
0 2020-10-07 23:15:47

data数据

将 DF 中一列中的数字（int）拆分为 4 个新列 R

问题描述

3 个解决方案

解决方案1 2 已采纳 2020-10-07 05:14:02

解决方案2 0 2020-10-07 08:07:39

解决方案3 0 2020-10-07 23:15:47

data数据

解决方案1
2 已采纳 2020-10-07 05:14:02

解决方案2
0 2020-10-07 08:07:39

解决方案3
0 2020-10-07 23:15:47