Not sure what I am doing wrong here. I have lines in a text file...the target lines look like this
- Nsource.Inhibitor 3 81.63 27.21 1.84 0.008
- Nsource.Inhibitor 3 90.31 17.21 0.84 <0.001
I want to extract the 0.008 and <0.001 from the end.
However, there are other lines that mean we have to use the first part of the line as part of a pattern....
- Nsource 1 1238.10 1238.10 40.29 <.001
- Inhibitor 3 1484.41 494.80 16.10 <.001
My attempt
reline <- "+ Nsource.Inhibitor 3 81.63 27.21 1.84 0.008"
decnum <- "[[:digit:]]+\\.*[[:digit:]]*"
chk <- paste0("+ Nsource.Inhibitor[:blank:]+", decnum, "[:blank:]+", decnum, "[:blank:]+", decnum, "[:blank:]+", decnum,
"[:blank:]+", "([[:digit:]]+\\.*[[:digit:]]*)")
gsub(chk, "\\1",reline)
returns:
"+ Nsource.Inhibitor\\t 3\\t 81.63\\t 27.21\\t 1.84\\t 0.008"
Thanks for your help.
Matt
Something like this?
library(stringr)
strings <- c("Nsource.Inhibitor 3 81.63 27.21 1.84 0.008", "Nsource.Inhibitor 3 90.31 17.21 0.84 <0.001",
"Nsource 1 1238.10 1238.10 40.29 <.001", "Inhibitor 3 1484.41 494.80 16.10 <.001")
str_match(strings, "(?=^Nsource.Inhibitor).*?(<?\\d+\\.\\d+)$")[,2]
This yields
[1] "0.008" "<0.001" NA NA
It ensures, there's Nsource.Inhibitor
at the start of the string and only then matches the last \\d+.\\d+
pattern of that line (plus <
eventually).
If your target lines contain "Nsource.Inhibitor" and the last character is a number, and you want to extract all the characters after the last space, then try:
gsub(".*Nsource\\.Inhibitor.*\\s(.*[0-9])$", "\\1", reline)
You could add ignore.case = T
if Nsource
or Inhibitor
appear without caps.
Examples:
> reline <- "+ Nsource.Inhibitor 3 81.63 27.21 1.84 <0.008"
> output <- gsub(".*Nsource\\.Inhibitor.*\\s(.*[0-9])$", "\\1", reline, ignore.case = T)
> output
[1] "<0.008"
> reline <- "+ Nsource.Inhibitor 3 81.11 27 1232 23 123111 55.5555 0.38"
> output <- gsub(".*Nsource\\.inhibitor.*\\s(.*[0-9])$", "\\1", reline, ignore.case = T)
> output
[1] "0.38"
strings <- c("Nsource.Inhibitor 3 81.63 27.21 1.84 0.008", "Nsource.Inhibitor 3 90.31 17.21 0.84 <0.001", "Nsource 1 1238.10 1238.10 40.29 <.001", "Inhibitor 3 1484.41 494.80 16.10 <.001")
下面的表达式使用grep拾取包含子字符串'Nsource.Inhibitor'的字符串,用' '
分割字符串,并返回每个分割后的字符串的第6个元素。
sapply(strsplit(strings[grep('Nsource.Inhibitor', strings)], ' '), '[[',6)
There is no reason for using regex here. Simply read the file as a data.frame and do simple subsetting:
DF <- read.table(text = "Nsource.Inhibitor 3 81.63 27.21 1.84 0.008
Nsource.Inhibitor 3 90.31 17.21 0.84 <0.001
Nsource 1 1238.10 1238.10 40.29 <.001
nhibitor 3 1484.41 494.80 16.10 <.001", stringsAsFactors = FALSE) #you can read from file directly
DF[DF$V1 == "Nsource.Inhibitor", ncol(DF)]
#[1] "0.008" "<0.001"
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.