[英]Using the function “substr” over multiple observations in R
I have a dataset of 100 observations and one variable, with each observations being a string of integers.我有一个包含 100 个观察值和一个变量的数据集,每个观察值都是一串整数。 I would like to substract integers from each observation and create a new data frame with the same number of observations but with each string divided into several variables.
我想从每个观察中减去整数,并创建一个具有相同观察数量但每个字符串分为几个变量的新数据框。
Basically I would like to go from this:基本上我想从这个 go :
Variable 1
1234567
1234567
1234567
To this对此
Variable 1 Variable 2 Variable 3
123 456 7
123 456 7
123 456 7
I have tried using the function substr to do it, but while it works correctly when I use it with a subset of only 1 observation, it does not appear to work when I use it over the whole dataset.我曾尝试使用 function substr 来执行此操作,但是当我将它与只有 1 个观察的子集一起使用时它可以正常工作,但当我在整个数据集上使用它时它似乎不起作用。 Any ideas about how could I use substr here, or if there is a better alternative for it?
关于如何在这里使用 substr 的任何想法,或者是否有更好的选择?
Assuming your dataset is called df
, with the column you want to split called Var1
:假设您的数据集称为
df
,您要拆分的列称为Var1
:
tidyr::separate(df, Var1, into = c("Var1", "Var2", "Var3"), sep = c(3, 6, 7))
# Var1 Var2 Var3
# 1 123 456 7
# 2 123 456 7
# 3 123 456 7
The sep
argument takes a vector indicating the positions at which to split. sep
参数采用一个向量,指示要拆分的位置。
In base R
, we can also use sub
to create a delimiter at the specified location and then with read.csv
can read it在
base R
中,我们也可以使用sub
在指定位置创建分隔符,然后使用read.csv
可以读取它
read.csv(text = sub("^(...)(...)(.)$", "\\1,\\2,\\3",
df1$Variable1), header = FALSE, col.names = paste0("Variable", 1:3))
# Variable1 Variable2 Variable3
#1 123 456 7
#2 123 456 7
#3 123 456 7
Or as @markus mentioned in the comments, read.fwf
can be used along with textConnection
或者正如评论中提到的@markus,
read.fwf
可以与textConnection
一起使用
read.fwf(textConnection(paste(df1$Variable1, collapse="\n")),
widths = c(3, 3, 1), as.is = TRUE)
# V1 V2 V3
#1 123 456 7
#2 123 456 7
#3 123 456 7
df1 <- structure(list(Variable1 = c(1234567L, 1234567L, 1234567L)),
class = "data.frame", row.names = c(NA,
-3L))
You could use substr
with mapply
.您可以将
substr
与mapply
一起使用。
dat <- cbind(dat, mapply(function(...) as.double(substr(...)), list(dat$v1),
c(1, 4, 7), c(3, 6, 7)))
dat
# v1 v2 1 2 3
# 1 1234567 -0.60679296 123 456 7
# 2 1234567 -0.06347641 123 456 7
# 3 1234567 -0.58993170 123 456 7
# 4 1234567 -0.71293088 123 456 7
# 5 1234567 -0.28107903 123 456 7
Data数据
dat <- data.frame(v1=1234567, v2=rnorm(5))
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.