简体   繁体   English

在 R 中的多个观察中使用 function “substr”

[英]Using the function “substr” over multiple observations in R

I have a dataset of 100 observations and one variable, with each observations being a string of integers.我有一个包含 100 个观察值和一个变量的数据集,每个观察值都是一串整数。 I would like to substract integers from each observation and create a new data frame with the same number of observations but with each string divided into several variables.我想从每个观察中减去整数,并创建一个具有相同观察数量但每个字符串分为几个变量的新数据框。

Basically I would like to go from this:基本上我想从这个 go :

Variable 1
1234567
1234567
1234567

To this对此

Variable 1   Variable 2   Variable 3

123             456         7
123             456         7
123             456         7

I have tried using the function substr to do it, but while it works correctly when I use it with a subset of only 1 observation, it does not appear to work when I use it over the whole dataset.我曾尝试使用 function substr 来执行此操作,但是当我将它与只有 1 个观察的子集一起使用时它可以正常工作,但当我在整个数据集上使用它时它似乎不起作用。 Any ideas about how could I use substr here, or if there is a better alternative for it?关于如何在这里使用 substr 的任何想法,或者是否有更好的选择?

Assuming your dataset is called df , with the column you want to split called Var1 :假设您的数据集称为df ,您要拆分的列称为Var1

tidyr::separate(df, Var1, into = c("Var1", "Var2", "Var3"), sep = c(3, 6, 7))

#   Var1 Var2 Var3
# 1  123  456    7
# 2  123  456    7
# 3  123  456    7

The sep argument takes a vector indicating the positions at which to split. sep参数采用一个向量,指示要拆分的位置。

In base R , we can also use sub to create a delimiter at the specified location and then with read.csv can read itbase R中,我们也可以使用sub在指定位置创建分隔符,然后使用read.csv可以读取它

read.csv(text = sub("^(...)(...)(.)$", "\\1,\\2,\\3", 
     df1$Variable1), header = FALSE, col.names = paste0("Variable", 1:3))
#  Variable1 Variable2 Variable3
#1       123       456         7
#2       123       456         7
#3       123       456         7

Or as @markus mentioned in the comments, read.fwf can be used along with textConnection或者正如评论中提到的@markus, read.fwf可以与textConnection一起使用

read.fwf(textConnection(paste(df1$Variable1, collapse="\n")),
         widths = c(3, 3, 1), as.is = TRUE)
#  V1  V2 V3
#1 123 456  7
#2 123 456  7
#3 123 456  7

data数据

df1 <- structure(list(Variable1 = c(1234567L, 1234567L, 1234567L)), 
      class = "data.frame", row.names = c(NA, 
-3L))     

You could use substr with mapply .您可以将substrmapply一起使用。

dat <- cbind(dat, mapply(function(...) as.double(substr(...)), list(dat$v1), 
                         c(1, 4, 7), c(3, 6, 7)))
dat
#        v1          v2   1   2 3
# 1 1234567 -0.60679296 123 456 7
# 2 1234567 -0.06347641 123 456 7
# 3 1234567 -0.58993170 123 456 7
# 4 1234567 -0.71293088 123 456 7
# 5 1234567 -0.28107903 123 456 7

Data数据

dat <- data.frame(v1=1234567, v2=rnorm(5))

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM