[英]Extract texts from character strings in R and store in a variable
I have a character vector like this : 我有一个这样的字符向量:
> filenames
[1] "C:/Users/USER/Desktop/Magnetic field vs. vacuum level/Data/20 v 40 b - 10.csv"
[2] "C:/Users/USER/Desktop/Magnetic field vs. vacuum level/Data/10 v 40 b - 11.csv"
[3] "C:/Users/USER/Desktop/Magnetic field vs. vacuum level/Data/30 v 60 b - 12.csv"
[4] "C:/Users/USER/Desktop/Magnetic field vs. vacuum level/Data/5 v 10 b - 6.csv"
[5] "C:/Users/USER/Desktop/Magnetic field vs. vacuum level/Data/20 v 40 b - 7.csv"
[6] "C:/Users/USER/Desktop/Magnetic field vs. vacuum level/Data/20 v 20 b - 8.csv"
[7] "C:/Users/USER/Desktop/Magnetic field vs. vacuum level/Data/10 v 30 b - 9.csv"
[8] "C:/Users/USER/Desktop/Magnetic field vs. vacuum level/Data/cont. v - 10.csv"
[9] "C:/Users/USER/Desktop/Magnetic field vs. vacuum level/Data/cont. v - 11.csv"
[10] "C:/Users/USER/Desktop/Magnetic field vs. vacuum level/Data/cont. v - 12.csv"
[11] "C:/Users/USER/Desktop/Magnetic field vs. vacuum level/Data/20 v 40 b - 6.csv"
I want to extract the values before v
and b
and store them in a variable. 我想提取
v
和b
之前的值并将它们存储在变量中。 Let me explain 让我解释
From filenames[1]
, I want to get the '20'
before v
and the '40'
before b
and store that in a variable say r[1] = 20/40
从
filenames[1]
,我想在v
之前获取'20'
,在b
之前获取'40'
并将其存储在变量中,例如r[1] = 20/40
I want to do this for each filenames[i]
and for filenames containing 'cont. v'
我想对每个
filenames[i]
和包含'cont. v'
filenames[i]
执行此操作'cont. v'
'cont. v'
I want to write r[8] = 10
, r[9] = 10
. 'cont. v'
我想写r[8] = 10
, r[9] = 10
。 Here 10
is a predefined value 这里
10
是预定义值
Please help me in solving this. 请帮我解决这个问题。
You may try 你可以试试
library(stringr)
indx <- grepl('cont', filenames)
lst <- str_extract_all(filenames[!indx], '(\\d+)(?=\\s+(v|b))')
v1 <- sapply(lst, function(x) as.numeric(x[1])/as.numeric(x[2]))
v2 <- as.numeric(str_extract(filenames[indx], '\\d+(?=\\.csv)'))
r <- numeric(length(filenames))
r[indx] <- v2
r[!indx] <- v1
r
#[1] 0.5000000 0.2500000 0.5000000 0.5000000 0.5000000 1.0000000
#[7] 0.3333333 10.0000000 11.0000000 12.0000000 0.5000000
filenames <- c("C:/Users/USER/Desktop/Magnetic field vs. vacuum level/Data/20 v 40 b - 10.csv",
"C:/Users/USER/Desktop/Magnetic field vs. vacuum level/Data/10 v 40 b - 11.csv",
"C:/Users/USER/Desktop/Magnetic field vs. vacuum level/Data/30 v 60 b - 12.csv",
"C:/Users/USER/Desktop/Magnetic field vs. vacuum level/Data/5 v 10 b - 6.csv",
"C:/Users/USER/Desktop/Magnetic field vs. vacuum level/Data/20 v 40 b - 7.csv",
"C:/Users/USER/Desktop/Magnetic field vs. vacuum level/Data/20 v 20 b - 8.csv",
"C:/Users/USER/Desktop/Magnetic field vs. vacuum level/Data/10 v 30 b - 9.csv",
"C:/Users/USER/Desktop/Magnetic field vs. vacuum level/Data/cont. v - 10.csv",
"C:/Users/USER/Desktop/Magnetic field vs. vacuum level/Data/cont. v - 11.csv",
"C:/Users/USER/Desktop/Magnetic field vs. vacuum level/Data/cont. v - 12.csv",
"C:/Users/USER/Desktop/Magnetic field vs. vacuum level/Data/20 v 40 b - 6.csv"
)
As in the help for ?regexp
: 如对
?regexp
的帮助:
parse.one <- function(res, result) {
m <- do.call(rbind, lapply(seq_along(res), function(i) {
if(result[i] == -1) return("")
st <- attr(result, "capture.start")[i, ]
substring(res[i], st, st + attr(result, "capture.length")[i, ] - 1)
}))
colnames(m) <- attr(result, "capture.names")
m
}
filenames <- c("C:/Users/USER/Desktop/Magnetic field vs. vacuum level/Data/20 v 40 b - 10.csv",
"C:/Users/USER/Desktop/Magnetic field vs. vacuum level/Data/22 v 44 b - 10.csv",
"C:/Users/USER/Desktop/Magnetic field vs. vacuum level/Data/223 v 5 b - 10.csv")
regex <- '.*/(?<v>[0-9]+)\\ v\\ (?<b>[0-9]+)\\ b.*'
parsed <- regexpr(regex,filenames, perl=TRUE)
parse.one(filenames, parsed)
The parse.one
function needs to define only once. parse.one
函数仅需要定义一次。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.