简体   繁体   English

从R中的字符串中提取文本并存储在变量中

[英]Extract texts from character strings in R and store in a variable

I have a character vector like this : 我有一个这样的字符向量:

> filenames
[1] "C:/Users/USER/Desktop/Magnetic field vs. vacuum level/Data/20 v 40 b - 10.csv"
[2] "C:/Users/USER/Desktop/Magnetic field vs. vacuum level/Data/10 v 40 b - 11.csv"
[3] "C:/Users/USER/Desktop/Magnetic field vs. vacuum level/Data/30 v 60 b - 12.csv"
[4] "C:/Users/USER/Desktop/Magnetic field vs. vacuum level/Data/5 v 10 b - 6.csv" 
[5] "C:/Users/USER/Desktop/Magnetic field vs. vacuum level/Data/20 v 40 b - 7.csv" 
[6] "C:/Users/USER/Desktop/Magnetic field vs. vacuum level/Data/20 v 20 b - 8.csv" 
[7] "C:/Users/USER/Desktop/Magnetic field vs. vacuum level/Data/10 v 30 b - 9.csv" 
[8] "C:/Users/USER/Desktop/Magnetic field vs. vacuum level/Data/cont. v - 10.csv"  
[9] "C:/Users/USER/Desktop/Magnetic field vs. vacuum level/Data/cont. v - 11.csv"  
[10] "C:/Users/USER/Desktop/Magnetic field vs. vacuum level/Data/cont. v - 12.csv"  
[11] "C:/Users/USER/Desktop/Magnetic field vs. vacuum level/Data/20 v 40 b - 6.csv"      

I want to extract the values before v and b and store them in a variable. 我想提取vb之前的值并将它们存储在变量中。 Let me explain 让我解释

From filenames[1] , I want to get the '20' before v and the '40' before b and store that in a variable say r[1] = 20/40 filenames[1] ,我想在v之前获取'20' ,在b之前获取'40'并将其存储在变量中,例如r[1] = 20/40

I want to do this for each filenames[i] and for filenames containing 'cont. v' 我想对每个filenames[i]和包含'cont. v' filenames[i]执行此操作'cont. v' 'cont. v' I want to write r[8] = 10 , r[9] = 10 . 'cont. v'我想写r[8] = 10r[9] = 10 Here 10 is a predefined value 这里10是预定义值

Please help me in solving this. 请帮我解决这个问题。

You may try 你可以试试

 library(stringr)
 indx <- grepl('cont', filenames)
 lst <- str_extract_all(filenames[!indx], '(\\d+)(?=\\s+(v|b))')
 v1 <-  sapply(lst, function(x) as.numeric(x[1])/as.numeric(x[2]))

 v2 <- as.numeric(str_extract(filenames[indx], '\\d+(?=\\.csv)'))
 r <- numeric(length(filenames))
 r[indx] <- v2
 r[!indx] <- v1
 r
 #[1]  0.5000000  0.2500000  0.5000000  0.5000000  0.5000000  1.0000000
 #[7]  0.3333333 10.0000000 11.0000000 12.0000000  0.5000000

data 数据

filenames <- c("C:/Users/USER/Desktop/Magnetic field vs. vacuum level/Data/20 v 40 b - 10.csv", 
"C:/Users/USER/Desktop/Magnetic field vs. vacuum level/Data/10 v 40 b - 11.csv", 
"C:/Users/USER/Desktop/Magnetic field vs. vacuum level/Data/30 v 60 b - 12.csv", 
"C:/Users/USER/Desktop/Magnetic field vs. vacuum level/Data/5 v 10 b - 6.csv", 
"C:/Users/USER/Desktop/Magnetic field vs. vacuum level/Data/20 v 40 b - 7.csv", 
"C:/Users/USER/Desktop/Magnetic field vs. vacuum level/Data/20 v 20 b - 8.csv", 
"C:/Users/USER/Desktop/Magnetic field vs. vacuum level/Data/10 v 30 b - 9.csv", 
"C:/Users/USER/Desktop/Magnetic field vs. vacuum level/Data/cont. v - 10.csv", 
"C:/Users/USER/Desktop/Magnetic field vs. vacuum level/Data/cont. v - 11.csv", 
"C:/Users/USER/Desktop/Magnetic field vs. vacuum level/Data/cont. v - 12.csv", 
"C:/Users/USER/Desktop/Magnetic field vs. vacuum level/Data/20 v 40 b - 6.csv"
)

As in the help for ?regexp : 如对?regexp的帮助:

parse.one <- function(res, result) {
  m <- do.call(rbind, lapply(seq_along(res), function(i) {
    if(result[i] == -1) return("")
    st <- attr(result, "capture.start")[i, ]
    substring(res[i], st, st + attr(result, "capture.length")[i, ] - 1)
  }))
  colnames(m) <- attr(result, "capture.names")
  m
}

filenames <- c("C:/Users/USER/Desktop/Magnetic field vs. vacuum level/Data/20 v 40 b - 10.csv",
               "C:/Users/USER/Desktop/Magnetic field vs. vacuum level/Data/22 v 44 b - 10.csv",
               "C:/Users/USER/Desktop/Magnetic field vs. vacuum level/Data/223 v 5 b - 10.csv")
regex <- '.*/(?<v>[0-9]+)\\ v\\ (?<b>[0-9]+)\\ b.*'
parsed <- regexpr(regex,filenames, perl=TRUE)
parse.one(filenames, parsed)

The parse.one function needs to define only once. parse.one函数仅需要定义一次。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM