简体   繁体   English

将 DF 中一列中的数字(int)拆分为 4 个新列 R

[英]splitting a number(int) within a column in DF into 4 new columns R

Below is the output of one of my columns within a DF I've created from importing a weekly summary.csv.以下是我通过导入每周摘要.csv 创建的 DF 中的一列的输出。 these are unique codes and each code should only be 4 number long ie 8400, 9070 etc. when the summary document is produced all the codes are bunched together without commas or indentation.这些是唯一的代码,每个代码的长度只能是 4 个数字,即 8400、9070 等。当生成摘要文档时,所有代码都聚集在一起,没有逗号或缩进。 like below:像下面这样:

1 84709070
2 75508470
3 8400
3 750084009100

is there a way I can turn the above into 4 new rows that split the numbers start from the first int by 4 places ie output the fourth row would look like:有没有办法可以将上面的内容转换为 4 个新行,将数字从第一个 int 开始分成 4 个位置,即输出第四行如下所示:

tariff1, tariff2, tariff3, tariff4
7500     8400     9100     none

I managed to create an abomination in excel but it hardly works at the best of time and id prefer to use R for everything, we are getting about 30k of these entries a week would really streamline processes!我设法在 excel 中创建了一个令人憎恶的东西,但它在最好的时候几乎不起作用,我更喜欢使用 R 来处理所有事情,我们每周收到大约 3 万个这样的条目,这将真正简化流程!

You can use tidyr::separate mentioning the positions where you want to split in sep .您可以使用tidyr::separate提及要在sep拆分的位置。

tidyr::separate(df, V2, paste0('col', 1:4), sep = seq(4, 12, 4), convert = TRUE)

#  V1 col1 col2 col3 col4
#1  1 8470 9070   NA   NA
#2  2 7550 8470   NA   NA
#3  3 8400   NA   NA   NA
#4  3 7500 8400 9100   NA

seq generates the sequence of positions. seq生成位置序列。

seq(4, 12, 4)
#[1]  4  8 12

data数据

df <- structure(list(V1 = c(1L, 2L, 3L, 3L), V2 = c(84709070, 75508470, 
8400, 750084009100)), class = "data.frame", row.names = c(NA, -4L))

Here is a base R option, which defines a function f to split the numbers这是一个基本的 R 选项,它定义了一个函数f来分割数字

f <- function(x) t(`length<-`(as.numeric(sapply(seq(1,nchar(x),by = 4), function(k) substr(x,k,k+3))),4))
dfout <- cbind(df,data.frame(Vectorize(f)(df$V2)))

such that以至于

  V1           V2   X1   X2   X3   X4
1  1     84709070 8470 7550 8400 7500
2  2     75508470 9070 8470   NA 8400
3  3         8400   NA   NA   NA 9100
4  3 750084009100   NA   NA   NA   NA

Data数据

> dput(df)
structure(list(V1 = c(1L, 2L, 3L, 3L), V2 = c(84709070, 75508470, 
8400, 750084009100)), class = "data.frame", row.names = c(NA,
-4L))

An option with strsplit from base Rbase R使用strsplit的选项

lst1 <- strsplit(as.character(df$V2), "(?<=....)", perl = TRUE)
df[paste0('col', 1:4)] <- do.call(rbind, lapply(lst1, 
              `length<-`, max(lengths(lst1))+1))
df <- type.convert(df, as.is = TRUE)

-output -输出

df
#  V1           V2 col1 col2 col3 col4
#1  1     84709070 8470 9070   NA   NA
#2  2     75508470 7550 8470   NA   NA
#  3         8400 8400   NA   NA   NA
#4  3 750084009100 7500 8400 9100   NA

Or using read.fwf from base R或者使用来自base R read.fwf

df[paste0('col', 1:4)] <-  read.fwf(file = textConnection(as.character(df$V2)),
              widths = c(4, 4, 4, 4))

data数据

df <- structure(list(V1 = c(1L, 2L, 3L, 3L), V2 = c(84709070, 75508470, 
8400, 750084009100)), class = "data.frame", row.names = c(NA,
-4L))

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 是否有 R function 将向量应用于 df 中的列并创建新列? - Is there an R function to apply a vector to a column in a df, and create new columns? 使用不同的行数创建新的 R 列到 df - Creating new R column with different number of rows to df 在R中的数据表中将文本列拆分为参差不齐的多个新列 - Splitting text column into ragged multiple new columns in a data table in R R:将一列(不同长度)拆分为新列 - R: Splitting one column (different lengths) into new columns R - 用于将列的文本拆分为新列的循环 - R - For loop for splitting column's text to new columns R 或 pandas。 如果 df A 中的列值在 df B 中的 2 列给定的范围内,则打印 A 中的行 - R or pandas. If value of column in df A within range given by 2 columns in df B, print row in A 用R中的动态列名称将数据帧字符列拆分为任意数量的列 - Splitting a data frame character column into an arbitrary number of columns with dynamic column names in R R:将 df 中列表的第一个元素提取到新的缩减 df 中 - R: Extract first element of a list within a df into new reduced df 将列中的值拆分为 Rstudio 中的几列 - Splitting the Values within a column in several columns in Rstudio 创建一个新列作为 R 中其他列中的最小值 - Create a new column as the lowest value within other columns in R
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM